Test Failed
Push — master ( 64abe2...a464fa )
by Chris
04:02 queued 11s
created

abydos.fingerprint.synoname.synoname_toolcode()   F

Complexity

Conditions 51

Size

Total Lines 195
Code Lines 132

Duplication

Lines 0
Ratio 0 %

Importance

Changes 0
Metric Value
cc 51
eloc 132
nop 4
dl 0
loc 195
rs 0
c 0
b 0
f 0

How to fix   Long Method    Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

Complexity

Complex classes like abydos.fingerprint.synoname.synoname_toolcode() often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

1
# -*- coding: utf-8 -*-
2
3
# Copyright 2018 by Christopher C. Little.
4
# This file is part of Abydos.
5
#
6
# Abydos is free software: you can redistribute it and/or modify
7
# it under the terms of the GNU General Public License as published by
8
# the Free Software Foundation, either version 3 of the License, or
9
# (at your option) any later version.
10
#
11
# Abydos is distributed in the hope that it will be useful,
12
# but WITHOUT ANY WARRANTY; without even the implied warranty of
13
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
14
# GNU General Public License for more details.
15
#
16
# You should have received a copy of the GNU General Public License
17
# along with Abydos. If not, see <http://www.gnu.org/licenses/>.
18
19
"""abydos.fingerprint.
20
21
The fingerprint.synoname module implements the Synoname toolcode.
22
"""
23
24
from __future__ import unicode_literals
25
26
27
__all__ = ['synoname_toolcode']
28
29
_synoname_special_table = (
0 ignored issues
show
Coding Style Naming introduced by
Constant name "_synoname_special_table" doesn't conform to UPPER_CASE naming style ('(([A-Z_][A-Z0-9_]*)|(__.*__))$' pattern)

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
30
    # Roman, match, extra, method
31
    (False, 'NONE', '', 0),
32
    (False, 'aine', '', 3),
33
    (False, 'also erroneously', '', 4),
34
    (False, 'also identified with the', '', 2),
35
    (False, 'also identified with', '', 2),
36
    (False, 'archbishop', '', 7),
37
    (False, 'atelier', '', 7),
38
    (False, 'baron', '', 7),
39
    (False, 'cadet', '', 3),
40
    (False, 'cardinal', '', 7),
41
    (False, 'circle of', '', 5),
42
    (False, 'circle', '', 5),
43
    (False, 'class of', '', 5),
44
    (False, 'conde de', '', 7),
45
    (False, 'countess', '', 7),
46
    (False, 'count', '', 7),
47
    (False, "d'", " d'", 15),
48
    (False, 'dai', '', 15),
49
    (False, "dall'", " dall'", 15),
50
    (False, 'dalla', '', 15),
51
    (False, 'dalle', '', 15),
52
    (False, 'dal', '', 15),
53
    (False, 'da', '', 15),
54
    (False, 'degli', '', 15),
55
    (False, 'della', '', 15),
56
    (False, 'del', '', 15),
57
    (False, 'den', '', 15),
58
    (False, 'der altere', '', 3),
59
    (False, 'der jungere', '', 3),
60
    (False, 'der', '', 15),
61
    (False, 'de la', '', 15),
62
    (False, 'des', '', 15),
63
    (False, "de'", " de'", 15),
64
    (False, 'de', '', 15),
65
    (False, 'di ser', '', 7),
66
    (False, 'di', '', 15),
67
    (False, 'dos', '', 15),
68
    (False, 'du', '', 15),
69
    (False, 'duke of', '', 7),
70
    (False, 'earl of', '', 7),
71
    (False, 'el', '', 15),
72
    (False, 'fils', '', 3),
73
    (False, 'florentine follower of', '', 5),
74
    (False, 'follower of', '', 5),
75
    (False, 'fra', '', 7),
76
    (False, 'freiherr von', '', 7),
77
    (False, 'giovane', '', 7),
78
    (False, 'group', '', 5),
79
    (True, 'iii', '', 3),
80
    (True, 'ii', '', 3),
81
    (False, 'il giovane', '', 7),
82
    (False, 'il vecchio', '', 7),
83
    (False, 'il', '', 15),
84
    (False, "in't", '', 7),
85
    (False, 'in het', '', 7),
86
    (True, 'iv', '', 3),
87
    (True, 'ix', '', 3),
88
    (True, 'i', '', 3),
89
    (False, 'jr.', '', 3),
90
    (False, 'jr', '', 3),
91
    (False, 'juniore', '', 3),
92
    (False, 'junior', '', 3),
93
    (False, 'king of', '', 7),
94
    (False, "l'", " l'", 15),
95
    (False, "l'aine", '', 3),
96
    (False, 'la', '', 15),
97
    (False, 'le jeune', '', 3),
98
    (False, 'le', '', 15),
99
    (False, 'lo', '', 15),
100
    (False, 'maestro', '', 7),
101
    (False, 'maitre', '', 7),
102
    (False, 'marchioness', '', 7),
103
    (False, 'markgrafin von', '', 7),
104
    (False, 'marquess', '', 7),
105
    (False, 'marquis', '', 7),
106
    (False, 'master of the', '', 7),
107
    (False, 'master of', '', 7),
108
    (False, 'master known as the', '', 7),
109
    (False, 'master with the', '', 7),
110
    (False, 'master with', '', 7),
111
    (False, 'masters', '', 7),
112
    (False, 'master', '', 7),
113
    (False, 'meister', '', 7),
114
    (False, 'met de', '', 7),
115
    (False, 'met', '', 7),
116
    (False, 'mlle.', '', 7),
117
    (False, 'mlle', '', 7),
118
    (False, 'monogrammist', '', 7),
119
    (False, 'monsu', '', 7),
120
    (False, 'nee', '', 2),
121
    (False, 'of', '', 3),
122
    (False, 'oncle', '', 3),
123
    (False, 'op den', '', 15),
124
    (False, 'op de', '', 15),
125
    (False, 'or', '', 2),
126
    (False, 'over den', '', 15),
127
    (False, 'over de', '', 15),
128
    (False, 'over', '', 7),
129
    (False, 'p.re', '', 7),
130
    (False, 'p.r.a.', '', 1),
131
    (False, 'padre', '', 7),
132
    (False, 'painter', '', 7),
133
    (False, 'pere', '', 3),
134
    (False, 'possibly identified with', '', 6),
135
    (False, 'possibly', '', 6),
136
    (False, 'pseudo', '', 15),
137
    (False, 'r.a.', '', 1),
138
    (False, 'reichsgraf von', '', 7),
139
    (False, 'ritter von', '', 7),
140
    (False, 'sainte-', ' sainte-', 8),
141
    (False, 'sainte', '', 7),
142
    (False, 'saint-', ' saint-', 8),
143
    (False, 'saint', '', 7),
144
    (False, 'santa', '', 15),
145
    (False, "sant'", " sant'", 15),
146
    (False, 'san', '', 15),
147
    (False, 'ser', '', 7),
148
    (False, 'seniore', '', 3),
149
    (False, 'senior', '', 3),
150
    (False, 'sir', '', 5),
151
    (False, 'sr.', '', 3),
152
    (False, 'sr', '', 3),
153
    (False, 'ss.', ' ss.', 14),
154
    (False, 'ss', '', 6),
155
    (False, 'st-', ' st-', 8),
156
    (False, 'st.', ' st.', 15),
157
    (False, 'ste-', ' ste-', 8),
158
    (False, 'ste.', ' ste.', 15),
159
    (False, 'studio', '', 7),
160
    (False, 'sub-group', '', 5),
161
    (False, 'sultan of', '', 7),
162
    (False, 'ten', '', 15),
163
    (False, 'ter', '', 15),
164
    (False, 'the elder', '', 3),
165
    (False, 'the younger', '', 3),
166
    (False, 'the', '', 7),
167
    (False, 'tot', '', 15),
168
    (False, 'unidentified', '', 1),
169
    (False, 'van den', '', 15),
170
    (False, 'van der', '', 15),
171
    (False, 'van de', '', 15),
172
    (False, 'vanden', '', 15),
173
    (False, 'vander', '', 15),
174
    (False, 'van', '', 15),
175
    (False, 'vecchia', '', 7),
176
    (False, 'vecchio', '', 7),
177
    (True, 'viii', '', 3),
178
    (True, 'vii', '', 3),
179
    (True, 'vi', '', 3),
180
    (True, 'v', '', 3),
181
    (False, 'vom', '', 7),
182
    (False, 'von', '', 15),
183
    (False, 'workshop', '', 7),
184
    (True, 'xiii', '', 3),
185
    (True, 'xii', '', 3),
186
    (True, 'xiv', '', 3),
187
    (True, 'xix', '', 3),
188
    (True, 'xi', '', 3),
189
    (True, 'xviii', '', 3),
190
    (True, 'xvii', '', 3),
191
    (True, 'xvi', '', 3),
192
    (True, 'xv', '', 3),
193
    (True, 'xx', '', 3),
194
    (True, 'x', '', 3),
195
    (False, 'y', '', 7)
196
)
197
198
199
def synoname_toolcode(lname, fname='', qual='', normalize=0):
0 ignored issues
show
Comprehensibility introduced by
This function exceeds the maximum number of variables (30/15).
Loading history...
200
    """Build the Synoname toolcode.
201
202
    Cf. :cite:`Getty:1991,Gross:1991`.
203
204
    :param str lname: last name
205
    :param str fname: first name (can be blank)
206
    :param str qual: qualifier
207
    :param int normalize: normalization mode (0, 1, or 2)
208
    :returns: the transformed last and first names and the synoname toolcode
209
    :rtype: tuple
210
211
    >>> synoname_toolcode('hat')
212
    ('hat', '', '0000000003$$h')
213
    >>> synoname_toolcode('niall')
214
    ('niall', '', '0000000005$$n')
215
    >>> synoname_toolcode('colin')
216
    ('colin', '', '0000000005$$c')
217
    >>> synoname_toolcode('atcg')
218
    ('atcg', '', '0000000004$$a')
219
    >>> synoname_toolcode('entreatment')
220
    ('entreatment', '', '0000000011$$e')
221
222
    >>> synoname_toolcode('Ste.-Marie', 'Count John II', normalize=2)
223
    ('ste.-marie ii', 'count john', '0200491310$015b049a127c$smcji')
224
    >>> synoname_toolcode('Michelangelo IV', '', 'Workshop of')
225
    ('michelangelo iv', '', '3000550015$055b$mi')
226
    """
227
    method_dict = {'end': 1, 'middle': 2, 'beginning': 4,
228
                             'beginning_no_space': 8}
0 ignored issues
show
Coding Style introduced by
Wrong continued indentation (remove 10 spaces).
Loading history...
229
230
    lname = lname.lower()
231
    fname = fname.lower()
232
    qual = qual.lower()
233
234
    # Start with the basic code
235
    toolcode = ['0', '0', '0', '000', '00', '00', '$', '', '$', '']
236
237
    full_name = ' '.join((lname, fname))
238
239
    # Fill field 0 (qualifier)
240
    qual_3 = {'adaptation after', 'after', 'assistant of', 'assistants of',
241
              'circle of', 'follower of', 'imitator of', 'in the style of',
242
              'manner of', 'pupil of', 'school of', 'studio of',
243
              'style of', 'workshop of'}
244
    qual_2 = {'copy after', 'copy after?', 'copy of'}
245
    qual_1 = {'ascribed to', 'attributed to or copy after',
246
              'attributed to', 'possibly'}
247
248
    if qual in qual_3:
249
        toolcode[0] = '3'
250
    elif qual in qual_2:
251
        toolcode[0] = '2'
252
    elif qual in qual_1:
253
        toolcode[0] = '1'
254
255
    # Fill field 1 (punctuation)
256
    if '.' in full_name:
257
        toolcode[1] = '2'
258
    else:
259
        for punct in ',-/:;"&\'()!{|}?$%*+<=>[\\]^_`~':
260
            if punct in full_name:
261
                toolcode[1] = '1'
262
                break
263
264
    # Fill field 2 (generation)
265
    gen_1 = ('the elder', ' sr.', ' sr', 'senior', 'der altere', 'il vecchio',
266
             "l'aine", 'p.re', 'padre', 'seniore', 'vecchia', 'vecchio')
267
    gen_2 = (' jr.', ' jr', 'der jungere', 'il giovane', 'giovane', 'juniore',
268
             'junior', 'le jeune', 'the younger')
269
270
    elderyounger = ''  # save elder/younger for possible movement later
271
    for gen in gen_1:
272
        if gen in full_name:
273
            toolcode[2] = '1'
274
            elderyounger = gen
275
            break
276
    else:
277
        for gen in gen_2:
278
            if gen in full_name:
279
                toolcode[2] = '2'
280
                elderyounger = gen
281
                break
282
283
    # do comma flip
284
    if normalize:
285
        comma = lname.find(',')
286
        if comma != -1:
287
            lname_end = lname[comma + 1:]
288
            while lname_end[0] in {' ', ','}:
289
                lname_end = lname_end[1:]
290
            fname = lname_end + ' ' + fname
291
            lname = lname[:comma].strip()
292
293
    # do elder/younger move
294
    if normalize == 2 and elderyounger:
295
        elderyounger_loc = fname.find(elderyounger)
296
        if elderyounger_loc != -1:
297
            lname = ' '.join((lname, elderyounger.strip()))
298
            fname = ' '.join((fname[:elderyounger_loc].strip(),
299
                              fname[elderyounger_loc +
300
                                    len(elderyounger):])).strip()
301
302
    toolcode[4] = '{:02d}'.format(len(fname))
303
    toolcode[5] = '{:02d}'.format(len(lname))
304
305
    # strip punctuation
306
    for char in ',/:;"&()!{|}?$%*+<=>[\\]^_`~':
307
        full_name = full_name.replace(char, '')
308
    for pos, char in enumerate(full_name):
309
        if char == '-' and full_name[pos - 1:pos + 2] != 'b-g':
310
            full_name = full_name[:pos] + ' ' + full_name[pos + 1:]
311
312
    # Fill field 9 (search range)
313
    for letter in [_[0] for _ in full_name.split()]:
314
        if letter not in toolcode[9]:
315
            toolcode[9] += letter
316
        if len(toolcode[9]) == 15:
317
            break
318
319
    def roman_check(numeral, fname, lname):
320
        """Move Roman numerals from first name to last."""
321
        loc = fname.find(numeral)
322
        if fname and (loc != -1 and
323
                      (len(fname[loc:]) == len(numeral)) or
324
                      fname[loc+len(numeral)] in {' ', ','}):
325
            lname = ' '.join((lname, numeral))
326
            fname = ' '.join((fname[:loc].strip(),
327
                              fname[loc + len(numeral):].lstrip(' ,')))
328
        return fname.strip(), lname.strip()
329
330
    # Fill fields 7 (specials) and 3 (roman numerals)
331
    for num, special in enumerate(_synoname_special_table):
0 ignored issues
show
unused-code introduced by
Too many nested blocks (6/5)
Loading history...
unused-code introduced by
Too many nested blocks (7/5)
Loading history...
332
        roman, match, extra, method = special
333
        if method & method_dict['end']:
334
            match_context = ' ' + match
335
            loc = full_name.find(match_context)
336
            if ((len(full_name) > len(match_context)) and
337
                    (loc == len(full_name) - len(match_context))):
338
                if roman:
339
                    if not any(abbr in fname for abbr in ('i.', 'v.', 'x.')):
340
                        full_name = full_name[:loc]
341
                        toolcode[7] += '{:03d}'.format(num) + 'a'
342
                        if toolcode[3] == '000':
343
                            toolcode[3] = '{:03d}'.format(num)
344
                        if normalize == 2:
345
                            fname, lname = roman_check(match, fname, lname)
346
                else:
347
                    full_name = full_name[:loc]
348
                    toolcode[7] += '{:03d}'.format(num) + 'a'
349
        if method & method_dict['middle']:
350
            match_context = ' ' + match + ' '
351
            loc = 0
352
            while loc != -1:
353
                loc = full_name.find(match_context, loc+1)
354
                if loc > 0:
355
                    if roman:
356
                        if not any(abbr in fname for abbr in
357
                                   ('i.', 'v.', 'x.')):
358
                            full_name = (full_name[:loc] +
359
                                         full_name[loc + len(match) + 1:])
360
                            toolcode[7] += '{:03d}'.format(num) + 'b'
361
                            if toolcode[3] == '000':
362
                                toolcode[3] = '{:03d}'.format(num)
363
                            if normalize == 2:
364
                                fname, lname = roman_check(match, fname, lname)
365
                    else:
366
                        full_name = (full_name[:loc] +
367
                                     full_name[loc + len(match) + 1:])
368
                        toolcode[7] += '{:03d}'.format(num) + 'b'
369
        if method & method_dict['beginning']:
370
            match_context = match + ' '
371
            loc = full_name.find(match_context)
372
            if loc == 0:
373
                full_name = full_name[len(match) + 1:]
374
                toolcode[7] += '{:03d}'.format(num) + 'c'
375
        if method & method_dict['beginning_no_space']:
376
            loc = full_name.find(match)
377
            if loc == 0:
378
                toolcode[7] += '{:03d}'.format(num) + 'd'
379
                if full_name[:len(match)] not in toolcode[9]:
380
                    toolcode[9] += full_name[:len(match)]
381
382
        if extra:
383
            loc = full_name.find(extra)
384
            if loc != -1:
385
                toolcode[7] += '{:03d}'.format(num) + 'X'
386
                # Since extras are unique, we only look for each of them
387
                # once, and they include otherwise impossible characters for
388
                # this field, it's not possible for the following line to have
389
                # ever been false.
390
                # if full_name[loc:loc+len(extra)] not in toolcode[9]:
391
                toolcode[9] += full_name[loc:loc+len(match)]
392
393
    return lname, fname, ''.join(toolcode)
394
395
396
if __name__ == '__main__':
397
    import doctest
398
    doctest.testmod()
399