Completed
Pull Request — master (#225)
by Chris
09:15
created

abydos.distance._isg.ISG._isg_i()   F

Complexity

Conditions 14

Size

Total Lines 53
Code Lines 29

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 29
CRAP Score 14

Importance

Changes 0
Metric Value
eloc 29
dl 0
loc 53
ccs 29
cts 29
cp 1
rs 3.6
c 0
b 0
f 0
cc 14
nop 3
crap 14

How to fix   Long Method    Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

Complexity

Complex classes like abydos.distance._isg.ISG._isg_i() often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

1
# -*- coding: utf-8 -*-
2
3
# Copyright 2019 by Christopher C. Little.
4
# This file is part of Abydos.
5
#
6
# Abydos is free software: you can redistribute it and/or modify
7
# it under the terms of the GNU General Public License as published by
8
# the Free Software Foundation, either version 3 of the License, or
9
# (at your option) any later version.
10
#
11
# Abydos is distributed in the hope that it will be useful,
12
# but WITHOUT ANY WARRANTY; without even the implied warranty of
13
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
14
# GNU General Public License for more details.
15
#
16
# You should have received a copy of the GNU General Public License
17
# along with Abydos. If not, see <http://www.gnu.org/licenses/>.
18
19 1
"""abydos.distance._isg.
20
21
Bouchard & Pouyez's Indice de Similitude-Guth (ISG)
22
"""
23
24 1
from __future__ import (
25
    absolute_import,
26
    division,
27
    print_function,
28
    unicode_literals,
29
)
30
31 1
from ._distance import _Distance
32
33 1
__all__ = ['ISG']
34
35
36 1
class ISG(_Distance):
37
    """Indice de Similitude-Guth (ISG) similarity.
38
39
    This is an implementation of Bouchard & Pouyez's Indice de Similitude-Guth
40
    (ISG) :cite:`Bouchard:1980`. At its heart, ISG is Jaccard similarity, but
41
    limits on token matching are added according to part of Guth's matching
42
    criteria :cite:`Guth:1976`.
43
44
    :cite:`Bouchard:1980` is limited in its implementation details. Based on
45
    the examples given in the paper, it appears that only the first 4 of Guth's
46
    rules are considered (a letter in the first string must match a letter in
47
    the second string appearing in the same position, an adjacent position, or
48
    two positions ahead). It also appears that the distance in the paper is
49
    the greater of the distance from string 1 to string 2 and the distance
50
    from string 2 to string 1.
51
52
    These qualities can be specified as parameters. At initialization, specify
53
    ``full_guth=True`` to apply all of Guth's rules and ``symmetric=False`` to
54
    calculate only the distance from string 1 to string 2.
55
56
    .. versionadded:: 0.4.1
57
    """
58
59 1
    def __init__(self, full_guth=False, symmetric=True, **kwargs):
60
        """Initialize ISG instance.
61
62
        Parameters
63
        ----------
64
        full_guth : bool
65
            Whether to apply all of Guth's matching rules
66
        symmetric : bool
67
            Whether to calculate the symmetric distance
68
        **kwargs
69
            Arbitrary keyword arguments
70
71
72
        .. versionadded:: 0.4.1
73
74
        """
75 1
        super(ISG, self).__init__(**kwargs)
76 1
        self._full_guth = full_guth
77 1
        self._symmetric = symmetric
78
79 1
    def _isg_i(self, src, tar):
80
        """Return an individual ISG similarity (not symmetric) for src to tar.
81
82
        Parameters
83
        ----------
84
        src : str
85
            Source string for comparison
86
        tar : str
87
            Target string for comparison
88
89
        Returns
90
        -------
91
        float
92
            The ISG similarity
93
94
95
        .. versionadded:: 0.4.1
96
97
        """
98
99 1
        def _char_at(name, pos):
100 1
            if pos >= len(name):
101 1
                return None
102 1
            return name[pos]
103
104 1
        matches = 0
105 1
        for pos in range(len(src)):
106 1
            s = _char_at(src, pos)
107 1
            t = set(tar[max(0, pos - 1) : pos + 3])
108 1
            if s and s in t:
109 1
                matches += 1
110 1
                continue
111
112 1
            if self._full_guth:
113 1
                s = set(src[max(0, pos - 1) : pos + 3])
114 1
                t = _char_at(tar, pos)
115 1
                if t and t in s:
116 1
                    matches += 1
117 1
                    continue
118
119 1
                s = _char_at(src, pos + 1)
120 1
                t = _char_at(tar, pos + 1)
121 1
                if s and t and s == t:
122 1
                    matches += 1
123 1
                    continue
124
125 1
                s = _char_at(src, pos + 2)
126 1
                t = _char_at(tar, pos + 2)
127 1
                if s and t and s == t:
128 1
                    matches += 1
129 1
                    continue
130
131 1
        return matches / (len(src) + len(tar) - matches)
132
133 1
    def sim(self, src, tar):
134
        """Return the Indice de Similitude-Guth (ISG) similarity of two words.
135
136
        Parameters
137
        ----------
138
        src : str
139
            Source string for comparison
140
        tar : str
141
            Target string for comparison
142
143
        Returns
144
        -------
145
        float
146
            The ISG similarity
147
148
        Examples
149
        --------
150
        >>> cmp = ISG()
151
        >>> cmp.sim('cat', 'hat')
152
        0.5
153
        >>> cmp.sim('Niall', 'Neil')
154
        0.5
155
        >>> cmp.sim('aluminum', 'Catalan')
156
        0.15384615384615385
157
        >>> cmp.sim('ATCG', 'TAGC')
158
        1.0
159
160
161
        .. versionadded:: 0.4.1
162
163
        """
164 1
        if src == tar:
165 1
            return 1.0
166 1
        if len(src) > len(tar):
167 1
            src, tar = tar, src
168 1
        elif self._symmetric and len(src) == len(tar):
169 1
            return max(self._isg_i(src, tar), self._isg_i(tar, src))
170 1
        return self._isg_i(src, tar)
171
172
173
if __name__ == '__main__':
174
    import doctest
175
176
    doctest.testmod()
177