Completed
Pull Request — master (#135)
by Chris
11:32
created

SynonameToolcode.fingerprint()   F

Complexity

Conditions 51

Size

Total Lines 195
Code Lines 128

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 110
CRAP Score 51

Importance

Changes 0
Metric Value
eloc 128
dl 0
loc 195
ccs 110
cts 110
cp 1
rs 0
c 0
b 0
f 0
cc 51
nop 5
crap 51

How to fix   Long Method    Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

Complexity

Complex classes like abydos.fingerprint._synoname.SynonameToolcode.fingerprint() often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

1
# -*- coding: utf-8 -*-
2
3
# Copyright 2018 by Christopher C. Little.
4
# This file is part of Abydos.
5
#
6
# Abydos is free software: you can redistribute it and/or modify
7
# it under the terms of the GNU General Public License as published by
8
# the Free Software Foundation, either version 3 of the License, or
9
# (at your option) any later version.
10
#
11
# Abydos is distributed in the hope that it will be useful,
12
# but WITHOUT ANY WARRANTY; without even the implied warranty of
13
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
14
# GNU General Public License for more details.
15
#
16
# You should have received a copy of the GNU General Public License
17
# along with Abydos. If not, see <http://www.gnu.org/licenses/>.
18
19 1
"""abydos.fingerprint._synoname.
20
21
The fingerprint.synoname module implements the Synoname toolcode.
22
"""
23
24 1
from __future__ import unicode_literals
25
26 1
from ._fingerprint import Fingerprint
27
28 1
__all__ = ['SynonameToolcode', 'synoname_toolcode']
29
30
31 1
class SynonameToolcode(Fingerprint):
0 ignored issues
show
Unused Code introduced by
The variable __class__ seems to be unused.
Loading history...
32
    """Synoname Toolcode.
33
34
    Cf. :cite:`Getty:1991,Gross:1991`.
35
    """
36
37 1
    synoname_special_table = (
38
        # Roman, match, extra, method
39
        (False, 'NONE', '', 0),
40
        (False, 'aine', '', 3),
41
        (False, 'also erroneously', '', 4),
42
        (False, 'also identified with the', '', 2),
43
        (False, 'also identified with', '', 2),
44
        (False, 'archbishop', '', 7),
45
        (False, 'atelier', '', 7),
46
        (False, 'baron', '', 7),
47
        (False, 'cadet', '', 3),
48
        (False, 'cardinal', '', 7),
49
        (False, 'circle of', '', 5),
50
        (False, 'circle', '', 5),
51
        (False, 'class of', '', 5),
52
        (False, 'conde de', '', 7),
53
        (False, 'countess', '', 7),
54
        (False, 'count', '', 7),
55
        (False, "d'", " d'", 15),
56
        (False, 'dai', '', 15),
57
        (False, "dall'", " dall'", 15),
58
        (False, 'dalla', '', 15),
59
        (False, 'dalle', '', 15),
60
        (False, 'dal', '', 15),
61
        (False, 'da', '', 15),
62
        (False, 'degli', '', 15),
63
        (False, 'della', '', 15),
64
        (False, 'del', '', 15),
65
        (False, 'den', '', 15),
66
        (False, 'der altere', '', 3),
67
        (False, 'der jungere', '', 3),
68
        (False, 'der', '', 15),
69
        (False, 'de la', '', 15),
70
        (False, 'des', '', 15),
71
        (False, "de'", " de'", 15),
72
        (False, 'de', '', 15),
73
        (False, 'di ser', '', 7),
74
        (False, 'di', '', 15),
75
        (False, 'dos', '', 15),
76
        (False, 'du', '', 15),
77
        (False, 'duke of', '', 7),
78
        (False, 'earl of', '', 7),
79
        (False, 'el', '', 15),
80
        (False, 'fils', '', 3),
81
        (False, 'florentine follower of', '', 5),
82
        (False, 'follower of', '', 5),
83
        (False, 'fra', '', 7),
84
        (False, 'freiherr von', '', 7),
85
        (False, 'giovane', '', 7),
86
        (False, 'group', '', 5),
87
        (True, 'iii', '', 3),
88
        (True, 'ii', '', 3),
89
        (False, 'il giovane', '', 7),
90
        (False, 'il vecchio', '', 7),
91
        (False, 'il', '', 15),
92
        (False, "in't", '', 7),
93
        (False, 'in het', '', 7),
94
        (True, 'iv', '', 3),
95
        (True, 'ix', '', 3),
96
        (True, 'i', '', 3),
97
        (False, 'jr.', '', 3),
98
        (False, 'jr', '', 3),
99
        (False, 'juniore', '', 3),
100
        (False, 'junior', '', 3),
101
        (False, 'king of', '', 7),
102
        (False, "l'", " l'", 15),
103
        (False, "l'aine", '', 3),
104
        (False, 'la', '', 15),
105
        (False, 'le jeune', '', 3),
106
        (False, 'le', '', 15),
107
        (False, 'lo', '', 15),
108
        (False, 'maestro', '', 7),
109
        (False, 'maitre', '', 7),
110
        (False, 'marchioness', '', 7),
111
        (False, 'markgrafin von', '', 7),
112
        (False, 'marquess', '', 7),
113
        (False, 'marquis', '', 7),
114
        (False, 'master of the', '', 7),
115
        (False, 'master of', '', 7),
116
        (False, 'master known as the', '', 7),
117
        (False, 'master with the', '', 7),
118
        (False, 'master with', '', 7),
119
        (False, 'masters', '', 7),
120
        (False, 'master', '', 7),
121
        (False, 'meister', '', 7),
122
        (False, 'met de', '', 7),
123
        (False, 'met', '', 7),
124
        (False, 'mlle.', '', 7),
125
        (False, 'mlle', '', 7),
126
        (False, 'monogrammist', '', 7),
127
        (False, 'monsu', '', 7),
128
        (False, 'nee', '', 2),
129
        (False, 'of', '', 3),
130
        (False, 'oncle', '', 3),
131
        (False, 'op den', '', 15),
132
        (False, 'op de', '', 15),
133
        (False, 'or', '', 2),
134
        (False, 'over den', '', 15),
135
        (False, 'over de', '', 15),
136
        (False, 'over', '', 7),
137
        (False, 'p.re', '', 7),
138
        (False, 'p.r.a.', '', 1),
139
        (False, 'padre', '', 7),
140
        (False, 'painter', '', 7),
141
        (False, 'pere', '', 3),
142
        (False, 'possibly identified with', '', 6),
143
        (False, 'possibly', '', 6),
144
        (False, 'pseudo', '', 15),
145
        (False, 'r.a.', '', 1),
146
        (False, 'reichsgraf von', '', 7),
147
        (False, 'ritter von', '', 7),
148
        (False, 'sainte-', ' sainte-', 8),
149
        (False, 'sainte', '', 7),
150
        (False, 'saint-', ' saint-', 8),
151
        (False, 'saint', '', 7),
152
        (False, 'santa', '', 15),
153
        (False, "sant'", " sant'", 15),
154
        (False, 'san', '', 15),
155
        (False, 'ser', '', 7),
156
        (False, 'seniore', '', 3),
157
        (False, 'senior', '', 3),
158
        (False, 'sir', '', 5),
159
        (False, 'sr.', '', 3),
160
        (False, 'sr', '', 3),
161
        (False, 'ss.', ' ss.', 14),
162
        (False, 'ss', '', 6),
163
        (False, 'st-', ' st-', 8),
164
        (False, 'st.', ' st.', 15),
165
        (False, 'ste-', ' ste-', 8),
166
        (False, 'ste.', ' ste.', 15),
167
        (False, 'studio', '', 7),
168
        (False, 'sub-group', '', 5),
169
        (False, 'sultan of', '', 7),
170
        (False, 'ten', '', 15),
171
        (False, 'ter', '', 15),
172
        (False, 'the elder', '', 3),
173
        (False, 'the younger', '', 3),
174
        (False, 'the', '', 7),
175
        (False, 'tot', '', 15),
176
        (False, 'unidentified', '', 1),
177
        (False, 'van den', '', 15),
178
        (False, 'van der', '', 15),
179
        (False, 'van de', '', 15),
180
        (False, 'vanden', '', 15),
181
        (False, 'vander', '', 15),
182
        (False, 'van', '', 15),
183
        (False, 'vecchia', '', 7),
184
        (False, 'vecchio', '', 7),
185
        (True, 'viii', '', 3),
186
        (True, 'vii', '', 3),
187
        (True, 'vi', '', 3),
188
        (True, 'v', '', 3),
189
        (False, 'vom', '', 7),
190
        (False, 'von', '', 15),
191
        (False, 'workshop', '', 7),
192
        (True, 'xiii', '', 3),
193
        (True, 'xii', '', 3),
194
        (True, 'xiv', '', 3),
195
        (True, 'xix', '', 3),
196
        (True, 'xi', '', 3),
197
        (True, 'xviii', '', 3),
198
        (True, 'xvii', '', 3),
199
        (True, 'xvi', '', 3),
200
        (True, 'xv', '', 3),
201
        (True, 'xx', '', 3),
202
        (True, 'x', '', 3),
203
        (False, 'y', '', 7),
204
    )
205
206 1
    method_dict = {
207
        'end': 1,
208
        'middle': 2,
209
        'beginning': 4,
210
        'beginning_no_space': 8,
211
    }
212
213
    # Fill field 0 (qualifier)
214 1
    qual_3 = {
215
        'adaptation after',
216
        'after',
217
        'assistant of',
218
        'assistants of',
219
        'circle of',
220
        'follower of',
221
        'imitator of',
222
        'in the style of',
223
        'manner of',
224
        'pupil of',
225
        'school of',
226
        'studio of',
227
        'style of',
228
        'workshop of',
229
    }
230 1
    qual_2 = {'copy after', 'copy after?', 'copy of'}
231 1
    qual_1 = {
232
        'ascribed to',
233
        'attributed to or copy after',
234
        'attributed to',
235
        'possibly',
236
    }
237
238
    # Fill field 2 (generation)
239 1
    gen_1 = (
240
        'the elder',
241
        ' sr.',
242
        ' sr',
243
        'senior',
244
        'der altere',
245
        'il vecchio',
246
        "l'aine",
247
        'p.re',
248
        'padre',
249
        'seniore',
250
        'vecchia',
251
        'vecchio',
252
    )
253 1
    gen_2 = (
254
        ' jr.',
255
        ' jr',
256
        'der jungere',
257
        'il giovane',
258
        'giovane',
259
        'juniore',
260
        'junior',
261
        'le jeune',
262
        'the younger',
263
    )
264
265 1
    def fingerprint(self, lname, fname='', qual='', normalize=0):
0 ignored issues
show
Comprehensibility introduced by
This function exceeds the maximum number of variables (26/15).
Loading history...
Bug introduced by
Parameters differ from overridden 'fingerprint' method
Loading history...
266
        """Build the Synoname toolcode.
267
268
        :param str lname: last name
269
        :param str fname: first name (can be blank)
270
        :param str qual: qualifier
271
        :param int normalize: normalization mode (0, 1, or 2)
272
        :returns: the transformed names and the synoname toolcode
273
        :rtype: tuple
274
275
        >>> st = SynonameToolcode()
276
        >>> st.fingerprint('hat')
277
        ('hat', '', '0000000003$$h')
278
        >>> st.fingerprint('niall')
279
        ('niall', '', '0000000005$$n')
280
        >>> st.fingerprint('colin')
281
        ('colin', '', '0000000005$$c')
282
        >>> st.fingerprint('atcg')
283
        ('atcg', '', '0000000004$$a')
284
        >>> st.fingerprint('entreatment')
285
        ('entreatment', '', '0000000011$$e')
286
287
        >>> st.fingerprint('Ste.-Marie', 'Count John II', normalize=2)
288
        ('ste.-marie ii', 'count john', '0200491310$015b049a127c$smcji')
289
        >>> st.fingerprint('Michelangelo IV', '', 'Workshop of')
290
        ('michelangelo iv', '', '3000550015$055b$mi')
291
        """
292 1
        lname = lname.lower()
293 1
        fname = fname.lower()
294 1
        qual = qual.lower()
295
296
        # Start with the basic code
297 1
        toolcode = ['0', '0', '0', '000', '00', '00', '$', '', '$', '']
298
299 1
        full_name = ' '.join((lname, fname))
300
301 1
        if qual in self.qual_3:
302 1
            toolcode[0] = '3'
303 1
        elif qual in self.qual_2:
304 1
            toolcode[0] = '2'
305 1
        elif qual in self.qual_1:
306 1
            toolcode[0] = '1'
307
308
        # Fill field 1 (punctuation)
309 1
        if '.' in full_name:
310 1
            toolcode[1] = '2'
311
        else:
312 1
            for punct in ',-/:;"&\'()!{|}?$%*+<=>[\\]^_`~':
313 1
                if punct in full_name:
314 1
                    toolcode[1] = '1'
315 1
                    break
316
317 1
        elderyounger = ''  # save elder/younger for possible movement later
318 1
        for gen in self.gen_1:
319 1
            if gen in full_name:
320 1
                toolcode[2] = '1'
321 1
                elderyounger = gen
322 1
                break
323
        else:
324 1
            for gen in self.gen_2:
325 1
                if gen in full_name:
326 1
                    toolcode[2] = '2'
327 1
                    elderyounger = gen
328 1
                    break
329
330
        # do comma flip
331 1
        if normalize:
332 1
            comma = lname.find(',')
333 1
            if comma != -1:
334 1
                lname_end = lname[comma + 1 :]
335 1
                while lname_end[0] in {' ', ','}:
336 1
                    lname_end = lname_end[1:]
337 1
                fname = lname_end + ' ' + fname
338 1
                lname = lname[:comma].strip()
339
340
        # do elder/younger move
341 1
        if normalize == 2 and elderyounger:
342 1
            elderyounger_loc = fname.find(elderyounger)
343 1
            if elderyounger_loc != -1:
344 1
                lname = ' '.join((lname, elderyounger.strip()))
345 1
                fname = ' '.join(
346
                    (
347
                        fname[:elderyounger_loc].strip(),
348
                        fname[elderyounger_loc + len(elderyounger) :],
349
                    )
350
                ).strip()
351
352 1
        toolcode[4] = '{:02d}'.format(len(fname))
353 1
        toolcode[5] = '{:02d}'.format(len(lname))
354
355
        # strip punctuation
356 1
        for char in ',/:;"&()!{|}?$%*+<=>[\\]^_`~':
357 1
            full_name = full_name.replace(char, '')
358 1
        for pos, char in enumerate(full_name):
359 1
            if char == '-' and full_name[pos - 1 : pos + 2] != 'b-g':
360 1
                full_name = full_name[:pos] + ' ' + full_name[pos + 1 :]
361
362
        # Fill field 9 (search range)
363 1
        for letter in [_[0] for _ in full_name.split()]:
364 1
            if letter not in toolcode[9]:
365 1
                toolcode[9] += letter
366 1
            if len(toolcode[9]) == 15:
367 1
                break
368
369 1
        def roman_check(numeral, fname, lname):
370
            """Move Roman numerals from first name to last."""
371 1
            loc = fname.find(numeral)
372 1
            if fname and (
373
                loc != -1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
374
                and (len(fname[loc:]) == len(numeral))
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
375
                or fname[loc + len(numeral)] in {' ', ','}
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
376
            ):
377 1
                lname = ' '.join((lname, numeral))
378 1
                fname = ' '.join(
379
                    (
380
                        fname[:loc].strip(),
381
                        fname[loc + len(numeral) :].lstrip(' ,'),
382
                    )
383
                )
384 1
            return fname.strip(), lname.strip()
385
386
        # Fill fields 7 (specials) and 3 (roman numerals)
387 1
        for num, special in enumerate(self.synoname_special_table):
0 ignored issues
show
unused-code introduced by
Too many nested blocks (6/5)
Loading history...
unused-code introduced by
Too many nested blocks (7/5)
Loading history...
388 1
            roman, match, extra, method = special
389 1
            if method & self.method_dict['end']:
390 1
                match_context = ' ' + match
391 1
                loc = full_name.find(match_context)
392 1
                if (len(full_name) > len(match_context)) and (
393
                    loc == len(full_name) - len(match_context)
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
394
                ):
395 1
                    if roman:
396 1
                        if not any(
397
                            abbr in fname for abbr in ('i.', 'v.', 'x.')
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
398
                        ):
399 1
                            full_name = full_name[:loc]
400 1
                            toolcode[7] += '{:03d}'.format(num) + 'a'
401 1
                            if toolcode[3] == '000':
402 1
                                toolcode[3] = '{:03d}'.format(num)
403 1
                            if normalize == 2:
404 1
                                fname, lname = roman_check(match, fname, lname)
405
                    else:
406 1
                        full_name = full_name[:loc]
407 1
                        toolcode[7] += '{:03d}'.format(num) + 'a'
408 1
            if method & self.method_dict['middle']:
409 1
                match_context = ' ' + match + ' '
410 1
                loc = 0
411 1
                while loc != -1:
412 1
                    loc = full_name.find(match_context, loc + 1)
413 1
                    if loc > 0:
414 1
                        if roman:
415 1
                            if not any(
416
                                abbr in fname for abbr in ('i.', 'v.', 'x.')
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
417
                            ):
418 1
                                full_name = (
419
                                    full_name[:loc]
420
                                    + full_name[loc + len(match) + 1 :]
421
                                )
422 1
                                toolcode[7] += '{:03d}'.format(num) + 'b'
423 1
                                if toolcode[3] == '000':
424 1
                                    toolcode[3] = '{:03d}'.format(num)
425 1
                                if normalize == 2:
426 1
                                    fname, lname = roman_check(
427
                                        match, fname, lname
428
                                    )
429
                        else:
430 1
                            full_name = (
431
                                full_name[:loc]
432
                                + full_name[loc + len(match) + 1 :]
433
                            )
434 1
                            toolcode[7] += '{:03d}'.format(num) + 'b'
435 1
            if method & self.method_dict['beginning']:
436 1
                match_context = match + ' '
437 1
                loc = full_name.find(match_context)
438 1
                if loc == 0:
439 1
                    full_name = full_name[len(match) + 1 :]
440 1
                    toolcode[7] += '{:03d}'.format(num) + 'c'
441 1
            if method & self.method_dict['beginning_no_space']:
442 1
                loc = full_name.find(match)
443 1
                if loc == 0:
444 1
                    toolcode[7] += '{:03d}'.format(num) + 'd'
445 1
                    if full_name[: len(match)] not in toolcode[9]:
446 1
                        toolcode[9] += full_name[: len(match)]
447
448 1
            if extra:
449 1
                loc = full_name.find(extra)
450 1
                if loc != -1:
451 1
                    toolcode[7] += '{:03d}'.format(num) + 'X'
452
                    # Since extras are unique, we only look for each of them
453
                    # once, and they include otherwise impossible characters
454
                    # for this field, it's not possible for the following line
455
                    # to have ever been false.
456
                    # if full_name[loc:loc+len(extra)] not in toolcode[9]:
457 1
                    toolcode[9] += full_name[loc : loc + len(match)]
458
459 1
        return lname, fname, ''.join(toolcode)
460
461
462 1
def synoname_toolcode(lname, fname='', qual='', normalize=0):
463
    """Build the Synoname toolcode.
464
465
    This is a wrapper for :py:meth:`SynonameToolcode.fingerprint`.
466
467
    :param str lname: last name
468
    :param str fname: first name (can be blank)
469
    :param str qual: qualifier
470
    :param int normalize: normalization mode (0, 1, or 2)
471
    :returns: the transformed names and the synoname toolcode
472
    :rtype: tuple
473
474
    >>> synoname_toolcode('hat')
475
    ('hat', '', '0000000003$$h')
476
    >>> synoname_toolcode('niall')
477
    ('niall', '', '0000000005$$n')
478
    >>> synoname_toolcode('colin')
479
    ('colin', '', '0000000005$$c')
480
    >>> synoname_toolcode('atcg')
481
    ('atcg', '', '0000000004$$a')
482
    >>> synoname_toolcode('entreatment')
483
    ('entreatment', '', '0000000011$$e')
484
485
    >>> synoname_toolcode('Ste.-Marie', 'Count John II', normalize=2)
486
    ('ste.-marie ii', 'count john', '0200491310$015b049a127c$smcji')
487
    >>> synoname_toolcode('Michelangelo IV', '', 'Workshop of')
488
    ('michelangelo iv', '', '3000550015$055b$mi')
489
    """
490 1
    return SynonameToolcode().fingerprint(lname, fname, qual, normalize)
491
492
493
if __name__ == '__main__':
494
    import doctest
495
496
    doctest.testmod()
497