Completed
Branch master (78a222)
by Chris
14:36
created

abydos.fingerprint._synoname   B

Complexity

Total Complexity 51

Size/Duplication

Total Lines 452
Duplicated Lines 0 %

Test Coverage

Coverage 100%

Importance

Changes 0
Metric Value
wmc 51
eloc 345
dl 0
loc 452
ccs 120
cts 120
cp 1
rs 7.92
c 0
b 0
f 0

1 Function

Rating   Name   Duplication   Size   Complexity  
F synoname_toolcode() 0 247 51

How to fix   Complexity   

Complexity

Complex classes like abydos.fingerprint._synoname often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

1
# -*- coding: utf-8 -*-
2
3
# Copyright 2018 by Christopher C. Little.
4
# This file is part of Abydos.
5
#
6
# Abydos is free software: you can redistribute it and/or modify
7
# it under the terms of the GNU General Public License as published by
8
# the Free Software Foundation, either version 3 of the License, or
9
# (at your option) any later version.
10
#
11
# Abydos is distributed in the hope that it will be useful,
12
# but WITHOUT ANY WARRANTY; without even the implied warranty of
13
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
14
# GNU General Public License for more details.
15
#
16
# You should have received a copy of the GNU General Public License
17
# along with Abydos. If not, see <http://www.gnu.org/licenses/>.
18
19 1
"""abydos.fingerprint._synoname.
20
21
The fingerprint.synoname module implements the Synoname toolcode.
22
"""
23
24 1
from __future__ import unicode_literals
25
26
27 1
__all__ = ['synoname_toolcode']
28
29 1
_synoname_special_table = (
0 ignored issues
show
Coding Style Naming introduced by
The name _synoname_special_table does not conform to the constant naming conventions ((([A-Z_][A-Z0-9_]*)|(__.*__))$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
30
    # Roman, match, extra, method
31
    (False, 'NONE', '', 0),
32
    (False, 'aine', '', 3),
33
    (False, 'also erroneously', '', 4),
34
    (False, 'also identified with the', '', 2),
35
    (False, 'also identified with', '', 2),
36
    (False, 'archbishop', '', 7),
37
    (False, 'atelier', '', 7),
38
    (False, 'baron', '', 7),
39
    (False, 'cadet', '', 3),
40
    (False, 'cardinal', '', 7),
41
    (False, 'circle of', '', 5),
42
    (False, 'circle', '', 5),
43
    (False, 'class of', '', 5),
44
    (False, 'conde de', '', 7),
45
    (False, 'countess', '', 7),
46
    (False, 'count', '', 7),
47
    (False, "d'", " d'", 15),
48
    (False, 'dai', '', 15),
49
    (False, "dall'", " dall'", 15),
50
    (False, 'dalla', '', 15),
51
    (False, 'dalle', '', 15),
52
    (False, 'dal', '', 15),
53
    (False, 'da', '', 15),
54
    (False, 'degli', '', 15),
55
    (False, 'della', '', 15),
56
    (False, 'del', '', 15),
57
    (False, 'den', '', 15),
58
    (False, 'der altere', '', 3),
59
    (False, 'der jungere', '', 3),
60
    (False, 'der', '', 15),
61
    (False, 'de la', '', 15),
62
    (False, 'des', '', 15),
63
    (False, "de'", " de'", 15),
64
    (False, 'de', '', 15),
65
    (False, 'di ser', '', 7),
66
    (False, 'di', '', 15),
67
    (False, 'dos', '', 15),
68
    (False, 'du', '', 15),
69
    (False, 'duke of', '', 7),
70
    (False, 'earl of', '', 7),
71
    (False, 'el', '', 15),
72
    (False, 'fils', '', 3),
73
    (False, 'florentine follower of', '', 5),
74
    (False, 'follower of', '', 5),
75
    (False, 'fra', '', 7),
76
    (False, 'freiherr von', '', 7),
77
    (False, 'giovane', '', 7),
78
    (False, 'group', '', 5),
79
    (True, 'iii', '', 3),
80
    (True, 'ii', '', 3),
81
    (False, 'il giovane', '', 7),
82
    (False, 'il vecchio', '', 7),
83
    (False, 'il', '', 15),
84
    (False, "in't", '', 7),
85
    (False, 'in het', '', 7),
86
    (True, 'iv', '', 3),
87
    (True, 'ix', '', 3),
88
    (True, 'i', '', 3),
89
    (False, 'jr.', '', 3),
90
    (False, 'jr', '', 3),
91
    (False, 'juniore', '', 3),
92
    (False, 'junior', '', 3),
93
    (False, 'king of', '', 7),
94
    (False, "l'", " l'", 15),
95
    (False, "l'aine", '', 3),
96
    (False, 'la', '', 15),
97
    (False, 'le jeune', '', 3),
98
    (False, 'le', '', 15),
99
    (False, 'lo', '', 15),
100
    (False, 'maestro', '', 7),
101
    (False, 'maitre', '', 7),
102
    (False, 'marchioness', '', 7),
103
    (False, 'markgrafin von', '', 7),
104
    (False, 'marquess', '', 7),
105
    (False, 'marquis', '', 7),
106
    (False, 'master of the', '', 7),
107
    (False, 'master of', '', 7),
108
    (False, 'master known as the', '', 7),
109
    (False, 'master with the', '', 7),
110
    (False, 'master with', '', 7),
111
    (False, 'masters', '', 7),
112
    (False, 'master', '', 7),
113
    (False, 'meister', '', 7),
114
    (False, 'met de', '', 7),
115
    (False, 'met', '', 7),
116
    (False, 'mlle.', '', 7),
117
    (False, 'mlle', '', 7),
118
    (False, 'monogrammist', '', 7),
119
    (False, 'monsu', '', 7),
120
    (False, 'nee', '', 2),
121
    (False, 'of', '', 3),
122
    (False, 'oncle', '', 3),
123
    (False, 'op den', '', 15),
124
    (False, 'op de', '', 15),
125
    (False, 'or', '', 2),
126
    (False, 'over den', '', 15),
127
    (False, 'over de', '', 15),
128
    (False, 'over', '', 7),
129
    (False, 'p.re', '', 7),
130
    (False, 'p.r.a.', '', 1),
131
    (False, 'padre', '', 7),
132
    (False, 'painter', '', 7),
133
    (False, 'pere', '', 3),
134
    (False, 'possibly identified with', '', 6),
135
    (False, 'possibly', '', 6),
136
    (False, 'pseudo', '', 15),
137
    (False, 'r.a.', '', 1),
138
    (False, 'reichsgraf von', '', 7),
139
    (False, 'ritter von', '', 7),
140
    (False, 'sainte-', ' sainte-', 8),
141
    (False, 'sainte', '', 7),
142
    (False, 'saint-', ' saint-', 8),
143
    (False, 'saint', '', 7),
144
    (False, 'santa', '', 15),
145
    (False, "sant'", " sant'", 15),
146
    (False, 'san', '', 15),
147
    (False, 'ser', '', 7),
148
    (False, 'seniore', '', 3),
149
    (False, 'senior', '', 3),
150
    (False, 'sir', '', 5),
151
    (False, 'sr.', '', 3),
152
    (False, 'sr', '', 3),
153
    (False, 'ss.', ' ss.', 14),
154
    (False, 'ss', '', 6),
155
    (False, 'st-', ' st-', 8),
156
    (False, 'st.', ' st.', 15),
157
    (False, 'ste-', ' ste-', 8),
158
    (False, 'ste.', ' ste.', 15),
159
    (False, 'studio', '', 7),
160
    (False, 'sub-group', '', 5),
161
    (False, 'sultan of', '', 7),
162
    (False, 'ten', '', 15),
163
    (False, 'ter', '', 15),
164
    (False, 'the elder', '', 3),
165
    (False, 'the younger', '', 3),
166
    (False, 'the', '', 7),
167
    (False, 'tot', '', 15),
168
    (False, 'unidentified', '', 1),
169
    (False, 'van den', '', 15),
170
    (False, 'van der', '', 15),
171
    (False, 'van de', '', 15),
172
    (False, 'vanden', '', 15),
173
    (False, 'vander', '', 15),
174
    (False, 'van', '', 15),
175
    (False, 'vecchia', '', 7),
176
    (False, 'vecchio', '', 7),
177
    (True, 'viii', '', 3),
178
    (True, 'vii', '', 3),
179
    (True, 'vi', '', 3),
180
    (True, 'v', '', 3),
181
    (False, 'vom', '', 7),
182
    (False, 'von', '', 15),
183
    (False, 'workshop', '', 7),
184
    (True, 'xiii', '', 3),
185
    (True, 'xii', '', 3),
186
    (True, 'xiv', '', 3),
187
    (True, 'xix', '', 3),
188
    (True, 'xi', '', 3),
189
    (True, 'xviii', '', 3),
190
    (True, 'xvii', '', 3),
191
    (True, 'xvi', '', 3),
192
    (True, 'xv', '', 3),
193
    (True, 'xx', '', 3),
194
    (True, 'x', '', 3),
195
    (False, 'y', '', 7),
196
)
197
198
199 1
def synoname_toolcode(lname, fname='', qual='', normalize=0):
0 ignored issues
show
Comprehensibility introduced by
This function exceeds the maximum number of variables (30/15).
Loading history...
200
    """Build the Synoname toolcode.
201
202
    Cf. :cite:`Getty:1991,Gross:1991`.
203
204
    :param str lname: last name
205
    :param str fname: first name (can be blank)
206
    :param str qual: qualifier
207
    :param int normalize: normalization mode (0, 1, or 2)
208
    :returns: the transformed last and first names and the synoname toolcode
209
    :rtype: tuple
210
211
    >>> synoname_toolcode('hat')
212
    ('hat', '', '0000000003$$h')
213
    >>> synoname_toolcode('niall')
214
    ('niall', '', '0000000005$$n')
215
    >>> synoname_toolcode('colin')
216
    ('colin', '', '0000000005$$c')
217
    >>> synoname_toolcode('atcg')
218
    ('atcg', '', '0000000004$$a')
219
    >>> synoname_toolcode('entreatment')
220
    ('entreatment', '', '0000000011$$e')
221
222
    >>> synoname_toolcode('Ste.-Marie', 'Count John II', normalize=2)
223
    ('ste.-marie ii', 'count john', '0200491310$015b049a127c$smcji')
224
    >>> synoname_toolcode('Michelangelo IV', '', 'Workshop of')
225
    ('michelangelo iv', '', '3000550015$055b$mi')
226
    """
227 1
    method_dict = {
228
        'end': 1,
229
        'middle': 2,
230
        'beginning': 4,
231
        'beginning_no_space': 8,
232
    }
233
234 1
    lname = lname.lower()
235 1
    fname = fname.lower()
236 1
    qual = qual.lower()
237
238
    # Start with the basic code
239 1
    toolcode = ['0', '0', '0', '000', '00', '00', '$', '', '$', '']
240
241 1
    full_name = ' '.join((lname, fname))
242
243
    # Fill field 0 (qualifier)
244 1
    qual_3 = {
245
        'adaptation after',
246
        'after',
247
        'assistant of',
248
        'assistants of',
249
        'circle of',
250
        'follower of',
251
        'imitator of',
252
        'in the style of',
253
        'manner of',
254
        'pupil of',
255
        'school of',
256
        'studio of',
257
        'style of',
258
        'workshop of',
259
    }
260 1
    qual_2 = {'copy after', 'copy after?', 'copy of'}
261 1
    qual_1 = {
262
        'ascribed to',
263
        'attributed to or copy after',
264
        'attributed to',
265
        'possibly',
266
    }
267
268 1
    if qual in qual_3:
269 1
        toolcode[0] = '3'
270 1
    elif qual in qual_2:
271 1
        toolcode[0] = '2'
272 1
    elif qual in qual_1:
273 1
        toolcode[0] = '1'
274
275
    # Fill field 1 (punctuation)
276 1
    if '.' in full_name:
277 1
        toolcode[1] = '2'
278
    else:
279 1
        for punct in ',-/:;"&\'()!{|}?$%*+<=>[\\]^_`~':
280 1
            if punct in full_name:
281 1
                toolcode[1] = '1'
282 1
                break
283
284
    # Fill field 2 (generation)
285 1
    gen_1 = (
286
        'the elder',
287
        ' sr.',
288
        ' sr',
289
        'senior',
290
        'der altere',
291
        'il vecchio',
292
        "l'aine",
293
        'p.re',
294
        'padre',
295
        'seniore',
296
        'vecchia',
297
        'vecchio',
298
    )
299 1
    gen_2 = (
300
        ' jr.',
301
        ' jr',
302
        'der jungere',
303
        'il giovane',
304
        'giovane',
305
        'juniore',
306
        'junior',
307
        'le jeune',
308
        'the younger',
309
    )
310
311 1
    elderyounger = ''  # save elder/younger for possible movement later
312 1
    for gen in gen_1:
313 1
        if gen in full_name:
314 1
            toolcode[2] = '1'
315 1
            elderyounger = gen
316 1
            break
317
    else:
318 1
        for gen in gen_2:
319 1
            if gen in full_name:
320 1
                toolcode[2] = '2'
321 1
                elderyounger = gen
322 1
                break
323
324
    # do comma flip
325 1
    if normalize:
326 1
        comma = lname.find(',')
327 1
        if comma != -1:
328 1
            lname_end = lname[comma + 1 :]
329 1
            while lname_end[0] in {' ', ','}:
330 1
                lname_end = lname_end[1:]
331 1
            fname = lname_end + ' ' + fname
332 1
            lname = lname[:comma].strip()
333
334
    # do elder/younger move
335 1
    if normalize == 2 and elderyounger:
336 1
        elderyounger_loc = fname.find(elderyounger)
337 1
        if elderyounger_loc != -1:
338 1
            lname = ' '.join((lname, elderyounger.strip()))
339 1
            fname = ' '.join(
340
                (
341
                    fname[:elderyounger_loc].strip(),
342
                    fname[elderyounger_loc + len(elderyounger) :],
343
                )
344
            ).strip()
345
346 1
    toolcode[4] = '{:02d}'.format(len(fname))
347 1
    toolcode[5] = '{:02d}'.format(len(lname))
348
349
    # strip punctuation
350 1
    for char in ',/:;"&()!{|}?$%*+<=>[\\]^_`~':
351 1
        full_name = full_name.replace(char, '')
352 1
    for pos, char in enumerate(full_name):
353 1
        if char == '-' and full_name[pos - 1 : pos + 2] != 'b-g':
354 1
            full_name = full_name[:pos] + ' ' + full_name[pos + 1 :]
355
356
    # Fill field 9 (search range)
357 1
    for letter in [_[0] for _ in full_name.split()]:
358 1
        if letter not in toolcode[9]:
359 1
            toolcode[9] += letter
360 1
        if len(toolcode[9]) == 15:
361 1
            break
362
363 1
    def roman_check(numeral, fname, lname):
364
        """Move Roman numerals from first name to last."""
365 1
        loc = fname.find(numeral)
366 1
        if fname and (
367
            loc != -1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
368
            and (len(fname[loc:]) == len(numeral))
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
369
            or fname[loc + len(numeral)] in {' ', ','}
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
370
        ):
371 1
            lname = ' '.join((lname, numeral))
372 1
            fname = ' '.join(
373
                (fname[:loc].strip(), fname[loc + len(numeral) :].lstrip(' ,'))
374
            )
375 1
        return fname.strip(), lname.strip()
376
377
    # Fill fields 7 (specials) and 3 (roman numerals)
378 1
    for num, special in enumerate(_synoname_special_table):
0 ignored issues
show
unused-code introduced by
Too many nested blocks (6/5)
Loading history...
unused-code introduced by
Too many nested blocks (7/5)
Loading history...
379 1
        roman, match, extra, method = special
380 1
        if method & method_dict['end']:
381 1
            match_context = ' ' + match
382 1
            loc = full_name.find(match_context)
383 1
            if (len(full_name) > len(match_context)) and (
384
                loc == len(full_name) - len(match_context)
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
385
            ):
386 1
                if roman:
387 1
                    if not any(abbr in fname for abbr in ('i.', 'v.', 'x.')):
388 1
                        full_name = full_name[:loc]
389 1
                        toolcode[7] += '{:03d}'.format(num) + 'a'
390 1
                        if toolcode[3] == '000':
391 1
                            toolcode[3] = '{:03d}'.format(num)
392 1
                        if normalize == 2:
393 1
                            fname, lname = roman_check(match, fname, lname)
394
                else:
395 1
                    full_name = full_name[:loc]
396 1
                    toolcode[7] += '{:03d}'.format(num) + 'a'
397 1
        if method & method_dict['middle']:
398 1
            match_context = ' ' + match + ' '
399 1
            loc = 0
400 1
            while loc != -1:
401 1
                loc = full_name.find(match_context, loc + 1)
402 1
                if loc > 0:
403 1
                    if roman:
404 1
                        if not any(
405
                            abbr in fname for abbr in ('i.', 'v.', 'x.')
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
406
                        ):
407 1
                            full_name = (
408
                                full_name[:loc]
409
                                + full_name[loc + len(match) + 1 :]
410
                            )
411 1
                            toolcode[7] += '{:03d}'.format(num) + 'b'
412 1
                            if toolcode[3] == '000':
413 1
                                toolcode[3] = '{:03d}'.format(num)
414 1
                            if normalize == 2:
415 1
                                fname, lname = roman_check(match, fname, lname)
416
                    else:
417 1
                        full_name = (
418
                            full_name[:loc] + full_name[loc + len(match) + 1 :]
419
                        )
420 1
                        toolcode[7] += '{:03d}'.format(num) + 'b'
421 1
        if method & method_dict['beginning']:
422 1
            match_context = match + ' '
423 1
            loc = full_name.find(match_context)
424 1
            if loc == 0:
425 1
                full_name = full_name[len(match) + 1 :]
426 1
                toolcode[7] += '{:03d}'.format(num) + 'c'
427 1
        if method & method_dict['beginning_no_space']:
428 1
            loc = full_name.find(match)
429 1
            if loc == 0:
430 1
                toolcode[7] += '{:03d}'.format(num) + 'd'
431 1
                if full_name[: len(match)] not in toolcode[9]:
432 1
                    toolcode[9] += full_name[: len(match)]
433
434 1
        if extra:
435 1
            loc = full_name.find(extra)
436 1
            if loc != -1:
437 1
                toolcode[7] += '{:03d}'.format(num) + 'X'
438
                # Since extras are unique, we only look for each of them
439
                # once, and they include otherwise impossible characters for
440
                # this field, it's not possible for the following line to have
441
                # ever been false.
442
                # if full_name[loc:loc+len(extra)] not in toolcode[9]:
443 1
                toolcode[9] += full_name[loc : loc + len(match)]
444
445 1
    return lname, fname, ''.join(toolcode)
446
447
448
if __name__ == '__main__':
449
    import doctest
450
451
    doctest.testmod()
452