abydos.fingerprint._synoname.synoname_toolcode() - Code Metrics - Inspection of "78a222a9f7d8976f6744d263e3d6d01a2a991c27" - chrislit/abydos - Measure and Improve Code Quality continuously with Scrutinizer

Completed

Branch — master (78a222)

by Chris

created 2018-10-26 11:30 UTC

abydos.fingerprint._synoname.synoname_toolcode() F

↳ Parent: abydos.fingerprint._synoname

Complexity

Conditions

Size

Total Lines	247
Code Lines	172

Duplication

Lines	0
Ratio	0 %

Code Coverage

Tests	116
CRAP Score	51

Importance

Changes

Metric	Value
eloc	172
dl	0
loc	247
ccs	116
cts	116
cp	1
rs	0
c	0
b	0
f	0
cc	51
nop	4
crap	51

How to fix Long Method Complexity

# -*- coding: utf-8 -*-

# Copyright 2018 by Christopher C. Little.
# This file is part of Abydos.
#
# Abydos is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Abydos is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with Abydos. If not, see <http://www.gnu.org/licenses/>.

"""abydos.fingerprint._synoname.

The fingerprint.synoname module implements the Synoname toolcode.
"""

from __future__ import unicode_literals


__all__ = ['synoname_toolcode']

_synoname_special_table = (

    # Roman, match, extra, method
    (False, 'NONE', '', 0),
    (False, 'aine', '', 3),
    (False, 'also erroneously', '', 4),
    (False, 'also identified with the', '', 2),
    (False, 'also identified with', '', 2),
    (False, 'archbishop', '', 7),
    (False, 'atelier', '', 7),
    (False, 'baron', '', 7),
    (False, 'cadet', '', 3),
    (False, 'cardinal', '', 7),
    (False, 'circle of', '', 5),
    (False, 'circle', '', 5),
    (False, 'class of', '', 5),
    (False, 'conde de', '', 7),
    (False, 'countess', '', 7),
    (False, 'count', '', 7),
    (False, "d'", " d'", 15),
    (False, 'dai', '', 15),
    (False, "dall'", " dall'", 15),
    (False, 'dalla', '', 15),
    (False, 'dalle', '', 15),
    (False, 'dal', '', 15),
    (False, 'da', '', 15),
    (False, 'degli', '', 15),
    (False, 'della', '', 15),
    (False, 'del', '', 15),
    (False, 'den', '', 15),
    (False, 'der altere', '', 3),
    (False, 'der jungere', '', 3),
    (False, 'der', '', 15),
    (False, 'de la', '', 15),
    (False, 'des', '', 15),
    (False, "de'", " de'", 15),
    (False, 'de', '', 15),
    (False, 'di ser', '', 7),
    (False, 'di', '', 15),
    (False, 'dos', '', 15),
    (False, 'du', '', 15),
    (False, 'duke of', '', 7),
    (False, 'earl of', '', 7),
    (False, 'el', '', 15),
    (False, 'fils', '', 3),
    (False, 'florentine follower of', '', 5),
    (False, 'follower of', '', 5),
    (False, 'fra', '', 7),
    (False, 'freiherr von', '', 7),
    (False, 'giovane', '', 7),
    (False, 'group', '', 5),
    (True, 'iii', '', 3),
    (True, 'ii', '', 3),
    (False, 'il giovane', '', 7),
    (False, 'il vecchio', '', 7),
    (False, 'il', '', 15),
    (False, "in't", '', 7),
    (False, 'in het', '', 7),
    (True, 'iv', '', 3),
    (True, 'ix', '', 3),
    (True, 'i', '', 3),
    (False, 'jr.', '', 3),
    (False, 'jr', '', 3),
    (False, 'juniore', '', 3),
    (False, 'junior', '', 3),
    (False, 'king of', '', 7),
    (False, "l'", " l'", 15),
    (False, "l'aine", '', 3),
    (False, 'la', '', 15),
    (False, 'le jeune', '', 3),
    (False, 'le', '', 15),
    (False, 'lo', '', 15),
    (False, 'maestro', '', 7),
    (False, 'maitre', '', 7),
    (False, 'marchioness', '', 7),
    (False, 'markgrafin von', '', 7),
    (False, 'marquess', '', 7),
    (False, 'marquis', '', 7),
    (False, 'master of the', '', 7),
    (False, 'master of', '', 7),
    (False, 'master known as the', '', 7),
    (False, 'master with the', '', 7),
    (False, 'master with', '', 7),
    (False, 'masters', '', 7),
    (False, 'master', '', 7),
    (False, 'meister', '', 7),
    (False, 'met de', '', 7),
    (False, 'met', '', 7),
    (False, 'mlle.', '', 7),
    (False, 'mlle', '', 7),
    (False, 'monogrammist', '', 7),
    (False, 'monsu', '', 7),
    (False, 'nee', '', 2),
    (False, 'of', '', 3),
    (False, 'oncle', '', 3),
    (False, 'op den', '', 15),
    (False, 'op de', '', 15),
    (False, 'or', '', 2),
    (False, 'over den', '', 15),
    (False, 'over de', '', 15),
    (False, 'over', '', 7),
    (False, 'p.re', '', 7),
    (False, 'p.r.a.', '', 1),
    (False, 'padre', '', 7),
    (False, 'painter', '', 7),
    (False, 'pere', '', 3),
    (False, 'possibly identified with', '', 6),
    (False, 'possibly', '', 6),
    (False, 'pseudo', '', 15),
    (False, 'r.a.', '', 1),
    (False, 'reichsgraf von', '', 7),
    (False, 'ritter von', '', 7),
    (False, 'sainte-', ' sainte-', 8),
    (False, 'sainte', '', 7),
    (False, 'saint-', ' saint-', 8),
    (False, 'saint', '', 7),
    (False, 'santa', '', 15),
    (False, "sant'", " sant'", 15),
    (False, 'san', '', 15),
    (False, 'ser', '', 7),
    (False, 'seniore', '', 3),
    (False, 'senior', '', 3),
    (False, 'sir', '', 5),
    (False, 'sr.', '', 3),
    (False, 'sr', '', 3),
    (False, 'ss.', ' ss.', 14),
    (False, 'ss', '', 6),
    (False, 'st-', ' st-', 8),
    (False, 'st.', ' st.', 15),
    (False, 'ste-', ' ste-', 8),
    (False, 'ste.', ' ste.', 15),
    (False, 'studio', '', 7),
    (False, 'sub-group', '', 5),
    (False, 'sultan of', '', 7),
    (False, 'ten', '', 15),
    (False, 'ter', '', 15),
    (False, 'the elder', '', 3),
    (False, 'the younger', '', 3),
    (False, 'the', '', 7),
    (False, 'tot', '', 15),
    (False, 'unidentified', '', 1),
    (False, 'van den', '', 15),
    (False, 'van der', '', 15),
    (False, 'van de', '', 15),
    (False, 'vanden', '', 15),
    (False, 'vander', '', 15),
    (False, 'van', '', 15),
    (False, 'vecchia', '', 7),
    (False, 'vecchio', '', 7),
    (True, 'viii', '', 3),
    (True, 'vii', '', 3),
    (True, 'vi', '', 3),
    (True, 'v', '', 3),
    (False, 'vom', '', 7),
    (False, 'von', '', 15),
    (False, 'workshop', '', 7),
    (True, 'xiii', '', 3),
    (True, 'xii', '', 3),
    (True, 'xiv', '', 3),
    (True, 'xix', '', 3),
    (True, 'xi', '', 3),
    (True, 'xviii', '', 3),
    (True, 'xvii', '', 3),
    (True, 'xvi', '', 3),
    (True, 'xv', '', 3),
    (True, 'xx', '', 3),
    (True, 'x', '', 3),
    (False, 'y', '', 7),
)


def synoname_toolcode(lname, fname='', qual='', normalize=0):

    """Build the Synoname toolcode.

    Cf. :cite:`Getty:1991,Gross:1991`.

    :param str lname: last name
    :param str fname: first name (can be blank)
    :param str qual: qualifier
    :param int normalize: normalization mode (0, 1, or 2)
    :returns: the transformed last and first names and the synoname toolcode
    :rtype: tuple

    >>> synoname_toolcode('hat')
    ('hat', '', '0000000003$$h')
    >>> synoname_toolcode('niall')
    ('niall', '', '0000000005$$n')
    >>> synoname_toolcode('colin')
    ('colin', '', '0000000005$$c')
    >>> synoname_toolcode('atcg')
    ('atcg', '', '0000000004$$a')
    >>> synoname_toolcode('entreatment')
    ('entreatment', '', '0000000011$$e')

    >>> synoname_toolcode('Ste.-Marie', 'Count John II', normalize=2)
    ('ste.-marie ii', 'count john', '0200491310$015b049a127c$smcji')
    >>> synoname_toolcode('Michelangelo IV', '', 'Workshop of')
    ('michelangelo iv', '', '3000550015$055b$mi')
    """
    method_dict = {
        'end': 1,
        'middle': 2,
        'beginning': 4,
        'beginning_no_space': 8,
    }

    lname = lname.lower()
    fname = fname.lower()
    qual = qual.lower()

    # Start with the basic code
    toolcode = ['0', '0', '0', '000', '00', '00', '$', '', '$', '']

    full_name = ' '.join((lname, fname))

    # Fill field 0 (qualifier)
    qual_3 = {
        'adaptation after',
        'after',
        'assistant of',
        'assistants of',
        'circle of',
        'follower of',
        'imitator of',
        'in the style of',
        'manner of',
        'pupil of',
        'school of',
        'studio of',
        'style of',
        'workshop of',
    }
    qual_2 = {'copy after', 'copy after?', 'copy of'}
    qual_1 = {
        'ascribed to',
        'attributed to or copy after',
        'attributed to',
        'possibly',
    }

    if qual in qual_3:
        toolcode[0] = '3'
    elif qual in qual_2:
        toolcode[0] = '2'
    elif qual in qual_1:
        toolcode[0] = '1'

    # Fill field 1 (punctuation)
    if '.' in full_name:
        toolcode[1] = '2'
    else:
        for punct in ',-/:;"&\'()!{|}?$%*+<=>[\\]^_`~':
            if punct in full_name:
                toolcode[1] = '1'
                break

    # Fill field 2 (generation)
    gen_1 = (
        'the elder',
        ' sr.',
        ' sr',
        'senior',
        'der altere',
        'il vecchio',
        "l'aine",
        'p.re',
        'padre',
        'seniore',
        'vecchia',
        'vecchio',
    )
    gen_2 = (
        ' jr.',
        ' jr',
        'der jungere',
        'il giovane',
        'giovane',
        'juniore',
        'junior',
        'le jeune',
        'the younger',
    )

    elderyounger = ''  # save elder/younger for possible movement later
    for gen in gen_1:
        if gen in full_name:
            toolcode[2] = '1'
            elderyounger = gen
            break
    else:
        for gen in gen_2:
            if gen in full_name:
                toolcode[2] = '2'
                elderyounger = gen
                break

    # do comma flip
    if normalize:
        comma = lname.find(',')
        if comma != -1:
            lname_end = lname[comma + 1 :]
            while lname_end[0] in {' ', ','}:
                lname_end = lname_end[1:]
            fname = lname_end + ' ' + fname
            lname = lname[:comma].strip()

    # do elder/younger move
    if normalize == 2 and elderyounger:
        elderyounger_loc = fname.find(elderyounger)
        if elderyounger_loc != -1:
            lname = ' '.join((lname, elderyounger.strip()))
            fname = ' '.join(
                (
                    fname[:elderyounger_loc].strip(),
                    fname[elderyounger_loc + len(elderyounger) :],
                )
            ).strip()

    toolcode[4] = '{:02d}'.format(len(fname))
    toolcode[5] = '{:02d}'.format(len(lname))

    # strip punctuation
    for char in ',/:;"&()!{|}?$%*+<=>[\\]^_`~':
        full_name = full_name.replace(char, '')
    for pos, char in enumerate(full_name):
        if char == '-' and full_name[pos - 1 : pos + 2] != 'b-g':
            full_name = full_name[:pos] + ' ' + full_name[pos + 1 :]

    # Fill field 9 (search range)
    for letter in [_[0] for _ in full_name.split()]:
        if letter not in toolcode[9]:
            toolcode[9] += letter
        if len(toolcode[9]) == 15:
            break

    def roman_check(numeral, fname, lname):
        """Move Roman numerals from first name to last."""
        loc = fname.find(numeral)
        if fname and (
            loc != -1

            and (len(fname[loc:]) == len(numeral))

            or fname[loc + len(numeral)] in {' ', ','}

        ):
            lname = ' '.join((lname, numeral))
            fname = ' '.join(
                (fname[:loc].strip(), fname[loc + len(numeral) :].lstrip(' ,'))
            )
        return fname.strip(), lname.strip()

    # Fill fields 7 (specials) and 3 (roman numerals)
    for num, special in enumerate(_synoname_special_table):

        roman, match, extra, method = special
        if method & method_dict['end']:
            match_context = ' ' + match
            loc = full_name.find(match_context)
            if (len(full_name) > len(match_context)) and (
                loc == len(full_name) - len(match_context)

            ):
                if roman:
                    if not any(abbr in fname for abbr in ('i.', 'v.', 'x.')):
                        full_name = full_name[:loc]
                        toolcode[7] += '{:03d}'.format(num) + 'a'
                        if toolcode[3] == '000':
                            toolcode[3] = '{:03d}'.format(num)
                        if normalize == 2:
                            fname, lname = roman_check(match, fname, lname)
                else:
                    full_name = full_name[:loc]
                    toolcode[7] += '{:03d}'.format(num) + 'a'
        if method & method_dict['middle']:
            match_context = ' ' + match + ' '
            loc = 0
            while loc != -1:
                loc = full_name.find(match_context, loc + 1)
                if loc > 0:
                    if roman:
                        if not any(
                            abbr in fname for abbr in ('i.', 'v.', 'x.')

                        ):
                            full_name = (
                                full_name[:loc]
                                + full_name[loc + len(match) + 1 :]
                            )
                            toolcode[7] += '{:03d}'.format(num) + 'b'
                            if toolcode[3] == '000':
                                toolcode[3] = '{:03d}'.format(num)
                            if normalize == 2:
                                fname, lname = roman_check(match, fname, lname)
                    else:
                        full_name = (
                            full_name[:loc] + full_name[loc + len(match) + 1 :]
                        )
                        toolcode[7] += '{:03d}'.format(num) + 'b'
        if method & method_dict['beginning']:
            match_context = match + ' '
            loc = full_name.find(match_context)
            if loc == 0:
                full_name = full_name[len(match) + 1 :]
                toolcode[7] += '{:03d}'.format(num) + 'c'
        if method & method_dict['beginning_no_space']:
            loc = full_name.find(match)
            if loc == 0:
                toolcode[7] += '{:03d}'.format(num) + 'd'
                if full_name[: len(match)] not in toolcode[9]:
                    toolcode[9] += full_name[: len(match)]

        if extra:
            loc = full_name.find(extra)
            if loc != -1:
                toolcode[7] += '{:03d}'.format(num) + 'X'
                # Since extras are unique, we only look for each of them
                # once, and they include otherwise impossible characters for
                # this field, it's not possible for the following line to have
                # ever been false.
                # if full_name[loc:loc+len(extra)] not in toolcode[9]:
                toolcode[9] += full_name[loc : loc + len(match)]

    return lname, fname, ''.join(toolcode)


if __name__ == '__main__':
    import doctest

    doctest.testmod()


1		# -- coding: utf-8 --
2
3		# Copyright 2018 by Christopher C. Little.
4		# This file is part of Abydos.
5		#
6		# Abydos is free software: you can redistribute it and/or modify
7		# it under the terms of the GNU General Public License as published by
8		# the Free Software Foundation, either version 3 of the License, or
9		# (at your option) any later version.
10		#
11		# Abydos is distributed in the hope that it will be useful,
12		# but WITHOUT ANY WARRANTY; without even the implied warranty of
13		# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
14		# GNU General Public License for more details.
15		#
16		# You should have received a copy of the GNU General Public License
17		# along with Abydos. If not, see <http://www.gnu.org/licenses/>.
18
19	1	"""abydos.fingerprint._synoname.
20
21		The fingerprint.synoname module implements the Synoname toolcode.
22		"""
23
24	1	from __future__ import unicode_literals
25
26
27	1	__all__ = ['synoname_toolcode']
28
29	1	_synoname_special_table = (
		0 ignored issues – show Coding Style Naming introduced 2018-10-23 05:52 UTC by Report Bug Copy Issue Report The name `_synoname_special_table` does not conform to the constant naming conventions (`(([A-Z_][A-Z0-9_])\|(__.__))$`). This check looks for invalid names for a range of different identifiers. You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements. If your project includes a Pylint configuration file, the settings contained in that file take precedence. To find out more about Pylint, please refer to their site. Loading history...
30		# Roman, match, extra, method
31		(False, 'NONE', '', 0),
32		(False, 'aine', '', 3),
33		(False, 'also erroneously', '', 4),
34		(False, 'also identified with the', '', 2),
35		(False, 'also identified with', '', 2),
36		(False, 'archbishop', '', 7),
37		(False, 'atelier', '', 7),
38		(False, 'baron', '', 7),
39		(False, 'cadet', '', 3),
40		(False, 'cardinal', '', 7),
41		(False, 'circle of', '', 5),
42		(False, 'circle', '', 5),
43		(False, 'class of', '', 5),
44		(False, 'conde de', '', 7),
45		(False, 'countess', '', 7),
46		(False, 'count', '', 7),
47		(False, "d'", " d'", 15),
48		(False, 'dai', '', 15),
49		(False, "dall'", " dall'", 15),
50		(False, 'dalla', '', 15),
51		(False, 'dalle', '', 15),
52		(False, 'dal', '', 15),
53		(False, 'da', '', 15),
54		(False, 'degli', '', 15),
55		(False, 'della', '', 15),
56		(False, 'del', '', 15),
57		(False, 'den', '', 15),
58		(False, 'der altere', '', 3),
59		(False, 'der jungere', '', 3),
60		(False, 'der', '', 15),
61		(False, 'de la', '', 15),
62		(False, 'des', '', 15),
63		(False, "de'", " de'", 15),
64		(False, 'de', '', 15),
65		(False, 'di ser', '', 7),
66		(False, 'di', '', 15),
67		(False, 'dos', '', 15),
68		(False, 'du', '', 15),
69		(False, 'duke of', '', 7),
70		(False, 'earl of', '', 7),
71		(False, 'el', '', 15),
72		(False, 'fils', '', 3),
73		(False, 'florentine follower of', '', 5),
74		(False, 'follower of', '', 5),
75		(False, 'fra', '', 7),
76		(False, 'freiherr von', '', 7),
77		(False, 'giovane', '', 7),
78		(False, 'group', '', 5),
79		(True, 'iii', '', 3),
80		(True, 'ii', '', 3),
81		(False, 'il giovane', '', 7),
82		(False, 'il vecchio', '', 7),
83		(False, 'il', '', 15),
84		(False, "in't", '', 7),
85		(False, 'in het', '', 7),
86		(True, 'iv', '', 3),
87		(True, 'ix', '', 3),
88		(True, 'i', '', 3),
89		(False, 'jr.', '', 3),
90		(False, 'jr', '', 3),
91		(False, 'juniore', '', 3),
92		(False, 'junior', '', 3),
93		(False, 'king of', '', 7),
94		(False, "l'", " l'", 15),
95		(False, "l'aine", '', 3),
96		(False, 'la', '', 15),
97		(False, 'le jeune', '', 3),
98		(False, 'le', '', 15),
99		(False, 'lo', '', 15),
100		(False, 'maestro', '', 7),
101		(False, 'maitre', '', 7),
102		(False, 'marchioness', '', 7),
103		(False, 'markgrafin von', '', 7),
104		(False, 'marquess', '', 7),
105		(False, 'marquis', '', 7),
106		(False, 'master of the', '', 7),
107		(False, 'master of', '', 7),
108		(False, 'master known as the', '', 7),
109		(False, 'master with the', '', 7),
110		(False, 'master with', '', 7),
111		(False, 'masters', '', 7),
112		(False, 'master', '', 7),
113		(False, 'meister', '', 7),
114		(False, 'met de', '', 7),
115		(False, 'met', '', 7),
116		(False, 'mlle.', '', 7),
117		(False, 'mlle', '', 7),
118		(False, 'monogrammist', '', 7),
119		(False, 'monsu', '', 7),
120		(False, 'nee', '', 2),
121		(False, 'of', '', 3),
122		(False, 'oncle', '', 3),
123		(False, 'op den', '', 15),
124		(False, 'op de', '', 15),
125		(False, 'or', '', 2),
126		(False, 'over den', '', 15),
127		(False, 'over de', '', 15),
128		(False, 'over', '', 7),
129		(False, 'p.re', '', 7),
130		(False, 'p.r.a.', '', 1),
131		(False, 'padre', '', 7),
132		(False, 'painter', '', 7),
133		(False, 'pere', '', 3),
134		(False, 'possibly identified with', '', 6),
135		(False, 'possibly', '', 6),
136		(False, 'pseudo', '', 15),
137		(False, 'r.a.', '', 1),
138		(False, 'reichsgraf von', '', 7),
139		(False, 'ritter von', '', 7),
140		(False, 'sainte-', ' sainte-', 8),
141		(False, 'sainte', '', 7),
142		(False, 'saint-', ' saint-', 8),
143		(False, 'saint', '', 7),
144		(False, 'santa', '', 15),
145		(False, "sant'", " sant'", 15),
146		(False, 'san', '', 15),
147		(False, 'ser', '', 7),
148		(False, 'seniore', '', 3),
149		(False, 'senior', '', 3),
150		(False, 'sir', '', 5),
151		(False, 'sr.', '', 3),
152		(False, 'sr', '', 3),
153		(False, 'ss.', ' ss.', 14),
154		(False, 'ss', '', 6),
155		(False, 'st-', ' st-', 8),
156		(False, 'st.', ' st.', 15),
157		(False, 'ste-', ' ste-', 8),
158		(False, 'ste.', ' ste.', 15),
159		(False, 'studio', '', 7),
160		(False, 'sub-group', '', 5),
161		(False, 'sultan of', '', 7),
162		(False, 'ten', '', 15),
163		(False, 'ter', '', 15),
164		(False, 'the elder', '', 3),
165		(False, 'the younger', '', 3),
166		(False, 'the', '', 7),
167		(False, 'tot', '', 15),
168		(False, 'unidentified', '', 1),
169		(False, 'van den', '', 15),
170		(False, 'van der', '', 15),
171		(False, 'van de', '', 15),
172		(False, 'vanden', '', 15),
173		(False, 'vander', '', 15),
174		(False, 'van', '', 15),
175		(False, 'vecchia', '', 7),
176		(False, 'vecchio', '', 7),
177		(True, 'viii', '', 3),
178		(True, 'vii', '', 3),
179		(True, 'vi', '', 3),
180		(True, 'v', '', 3),
181		(False, 'vom', '', 7),
182		(False, 'von', '', 15),
183		(False, 'workshop', '', 7),
184		(True, 'xiii', '', 3),
185		(True, 'xii', '', 3),
186		(True, 'xiv', '', 3),
187		(True, 'xix', '', 3),
188		(True, 'xi', '', 3),
189		(True, 'xviii', '', 3),
190		(True, 'xvii', '', 3),
191		(True, 'xvi', '', 3),
192		(True, 'xv', '', 3),
193		(True, 'xx', '', 3),
194		(True, 'x', '', 3),
195		(False, 'y', '', 7),
196		)
197
198
199	1	def synoname_toolcode(lname, fname='', qual='', normalize=0):
		0 ignored issues – show Comprehensibility introduced 2018-09-03 11:25 UTC by Report Bug Copy Issue Report This function exceeds the maximum number of variables (30/15). Loading history...
200		"""Build the Synoname toolcode.
201
202		Cf. :cite:`Getty:1991,Gross:1991`.
203
204		:param str lname: last name
205		:param str fname: first name (can be blank)
206		:param str qual: qualifier
207		:param int normalize: normalization mode (0, 1, or 2)
208		:returns: the transformed last and first names and the synoname toolcode
209		:rtype: tuple
210
211		>>> synoname_toolcode('hat')
212		('hat', '', '0000000003$$h')
213		>>> synoname_toolcode('niall')
214		('niall', '', '0000000005$$n')
215		>>> synoname_toolcode('colin')
216		('colin', '', '0000000005$$c')
217		>>> synoname_toolcode('atcg')
218		('atcg', '', '0000000004$$a')
219		>>> synoname_toolcode('entreatment')
220		('entreatment', '', '0000000011$$e')
221
222		>>> synoname_toolcode('Ste.-Marie', 'Count John II', normalize=2)
223		('ste.-marie ii', 'count john', '0200491310$015b049a127c$smcji')
224		>>> synoname_toolcode('Michelangelo IV', '', 'Workshop of')
225		('michelangelo iv', '', '3000550015$055b$mi')
226		"""
227	1	method_dict = {
228		'end': 1,
229		'middle': 2,
230		'beginning': 4,
231		'beginning_no_space': 8,
232		}
233
234	1	lname = lname.lower()
235	1	fname = fname.lower()
236	1	qual = qual.lower()
237
238		# Start with the basic code
239	1	toolcode = ['0', '0', '0', '000', '00', '00', '$', '', '$', '']
240
241	1	full_name = ' '.join((lname, fname))
242
243		# Fill field 0 (qualifier)
244	1	qual_3 = {
245		'adaptation after',
246		'after',
247		'assistant of',
248		'assistants of',
249		'circle of',
250		'follower of',
251		'imitator of',
252		'in the style of',
253		'manner of',
254		'pupil of',
255		'school of',
256		'studio of',
257		'style of',
258		'workshop of',
259		}
260	1	qual_2 = {'copy after', 'copy after?', 'copy of'}
261	1	qual_1 = {
262		'ascribed to',
263		'attributed to or copy after',
264		'attributed to',
265		'possibly',
266		}
267
268	1	if qual in qual_3:
269	1	toolcode[0] = '3'
270	1	elif qual in qual_2:
271	1	toolcode[0] = '2'
272	1	elif qual in qual_1:
273	1	toolcode[0] = '1'
274
275		# Fill field 1 (punctuation)
276	1	if '.' in full_name:
277	1	toolcode[1] = '2'
278		else:
279	1	for punct in ',-/:;"&\'()!{\|}?$%*+<=>[\\]^_`~':
280	1	if punct in full_name:
281	1	toolcode[1] = '1'
282	1	break
283
284		# Fill field 2 (generation)
285	1	gen_1 = (
286		'the elder',
287		' sr.',
288		' sr',
289		'senior',
290		'der altere',
291		'il vecchio',
292		"l'aine",
293		'p.re',
294		'padre',
295		'seniore',
296		'vecchia',
297		'vecchio',
298		)
299	1	gen_2 = (
300		' jr.',
301		' jr',
302		'der jungere',
303		'il giovane',
304		'giovane',
305		'juniore',
306		'junior',
307		'le jeune',
308		'the younger',
309		)
310
311	1	elderyounger = '' # save elder/younger for possible movement later
312	1	for gen in gen_1:
313	1	if gen in full_name:
314	1	toolcode[2] = '1'
315	1	elderyounger = gen
316	1	break
317		else:
318	1	for gen in gen_2:
319	1	if gen in full_name:
320	1	toolcode[2] = '2'
321	1	elderyounger = gen
322	1	break
323
324		# do comma flip
325	1	if normalize:
326	1	comma = lname.find(',')
327	1	if comma != -1:
328	1	lname_end = lname[comma + 1 :]
329	1	while lname_end[0] in {' ', ','}:
330	1	lname_end = lname_end[1:]
331	1	fname = lname_end + ' ' + fname
332	1	lname = lname[:comma].strip()
333
334		# do elder/younger move
335	1	if normalize == 2 and elderyounger:
336	1	elderyounger_loc = fname.find(elderyounger)
337	1	if elderyounger_loc != -1:
338	1	lname = ' '.join((lname, elderyounger.strip()))
339	1	fname = ' '.join(
340		(
341		fname[:elderyounger_loc].strip(),
342		fname[elderyounger_loc + len(elderyounger) :],
343		)
344		).strip()
345
346	1	toolcode[4] = '{:02d}'.format(len(fname))
347	1	toolcode[5] = '{:02d}'.format(len(lname))
348
349		# strip punctuation
350	1	for char in ',/:;"&()!{\|}?$%*+<=>[\\]^_`~':
351	1	full_name = full_name.replace(char, '')
352	1	for pos, char in enumerate(full_name):
353	1	if char == '-' and full_name[pos - 1 : pos + 2] != 'b-g':
354	1	full_name = full_name[:pos] + ' ' + full_name[pos + 1 :]
355
356		# Fill field 9 (search range)
357	1	for letter in [_[0] for _ in full_name.split()]:
358	1	if letter not in toolcode[9]:
359	1	toolcode[9] += letter
360	1	if len(toolcode[9]) == 15:
361	1	break
362
363	1	def roman_check(numeral, fname, lname):
364		"""Move Roman numerals from first name to last."""
365	1	loc = fname.find(numeral)
366	1	if fname and (
367		loc != -1
		0 ignored issues – show Coding Style introduced 2018-10-24 06:00 UTC by Report Bug Copy Issue Report Wrong hanging indentation before block (add 4 spaces). Loading history...
368		and (len(fname[loc:]) == len(numeral))
		0 ignored issues – show Coding Style introduced 2018-10-24 06:00 UTC by Report Bug Copy Issue Report Wrong hanging indentation before block (add 4 spaces). Loading history...
369		or fname[loc + len(numeral)] in {' ', ','}
		0 ignored issues – show Coding Style introduced 2018-10-24 06:00 UTC by Report Bug Copy Issue Report Wrong hanging indentation before block (add 4 spaces). Loading history...
370		):
371	1	lname = ' '.join((lname, numeral))
372	1	fname = ' '.join(
373		(fname[:loc].strip(), fname[loc + len(numeral) :].lstrip(' ,'))
374		)
375	1	return fname.strip(), lname.strip()
376
377		# Fill fields 7 (specials) and 3 (roman numerals)
378	1	for num, special in enumerate(_synoname_special_table):
		0 ignored issues – show unused-code introduced 2018-09-25 05:30 UTC by Report Bug Copy Issue Report Too many nested blocks (6/5) Loading history... unused-code introduced 2018-10-05 08:54 UTC by Report Bug Copy Issue Report Too many nested blocks (7/5) Loading history...
379	1	roman, match, extra, method = special
380	1	if method & method_dict['end']:
381	1	match_context = ' ' + match
382	1	loc = full_name.find(match_context)
383	1	if (len(full_name) > len(match_context)) and (
384		loc == len(full_name) - len(match_context)
		0 ignored issues – show Coding Style introduced 2018-10-24 06:00 UTC by Report Bug Copy Issue Report Wrong hanging indentation before block (add 4 spaces). Loading history...
385		):
386	1	if roman:
387	1	if not any(abbr in fname for abbr in ('i.', 'v.', 'x.')):
388	1	full_name = full_name[:loc]
389	1	toolcode[7] += '{:03d}'.format(num) + 'a'
390	1	if toolcode[3] == '000':
391	1	toolcode[3] = '{:03d}'.format(num)
392	1	if normalize == 2:
393	1	fname, lname = roman_check(match, fname, lname)
394		else:
395	1	full_name = full_name[:loc]
396	1	toolcode[7] += '{:03d}'.format(num) + 'a'
397	1	if method & method_dict['middle']:
398	1	match_context = ' ' + match + ' '
399	1	loc = 0
400	1	while loc != -1:
401	1	loc = full_name.find(match_context, loc + 1)
402	1	if loc > 0:
403	1	if roman:
404	1	if not any(
405		abbr in fname for abbr in ('i.', 'v.', 'x.')
		0 ignored issues – show Coding Style introduced 2018-10-24 06:00 UTC by Report Bug Copy Issue Report Wrong hanging indentation before block (add 4 spaces). Loading history...
406		):
407	1	full_name = (
408		full_name[:loc]
409		+ full_name[loc + len(match) + 1 :]
410		)
411	1	toolcode[7] += '{:03d}'.format(num) + 'b'
412	1	if toolcode[3] == '000':
413	1	toolcode[3] = '{:03d}'.format(num)
414	1	if normalize == 2:
415	1	fname, lname = roman_check(match, fname, lname)
416		else:
417	1	full_name = (
418		full_name[:loc] + full_name[loc + len(match) + 1 :]
419		)
420	1	toolcode[7] += '{:03d}'.format(num) + 'b'
421	1	if method & method_dict['beginning']:
422	1	match_context = match + ' '
423	1	loc = full_name.find(match_context)
424	1	if loc == 0:
425	1	full_name = full_name[len(match) + 1 :]
426	1	toolcode[7] += '{:03d}'.format(num) + 'c'
427	1	if method & method_dict['beginning_no_space']:
428	1	loc = full_name.find(match)
429	1	if loc == 0:
430	1	toolcode[7] += '{:03d}'.format(num) + 'd'
431	1	if full_name[: len(match)] not in toolcode[9]:
432	1	toolcode[9] += full_name[: len(match)]
433
434	1	if extra:
435	1	loc = full_name.find(extra)
436	1	if loc != -1:
437	1	toolcode[7] += '{:03d}'.format(num) + 'X'
438		# Since extras are unique, we only look for each of them
439		# once, and they include otherwise impossible characters for
440		# this field, it's not possible for the following line to have
441		# ever been false.
442		# if full_name[loc:loc+len(extra)] not in toolcode[9]:
443	1	toolcode[9] += full_name[loc : loc + len(match)]
444
445	1	return lname, fname, ''.join(toolcode)
446
447
448		if __name__ == '__main__':
449		import doctest
450
451		doctest.testmod()
452

chrislit / abydos

Branch — master (78a222)

abydos.fingerprint._synoname.synoname_toolcode() F

Complexity

Size

Duplication

Code Coverage

Importance

How to fix Long Method Complexity

Long Method

Complexity

Duplication Side-by-Side

Filter issues like