Completed
Push — master ( f43547...71985b )
by Chris
12:00 queued 10s
created

abydos.distance._cosine   A

Complexity

Total Complexity 6

Size/Duplication

Total Lines 161
Duplicated Lines 26.09 %

Test Coverage

Coverage 100%

Importance

Changes 0
Metric Value
eloc 27
dl 42
loc 161
ccs 20
cts 20
cp 1
rs 10
c 0
b 0
f 0
wmc 6

1 Method

Rating   Name   Duplication   Size   Complexity  
A Cosine.sim() 34 41 4

2 Functions

Rating   Name   Duplication   Size   Complexity  
A sim_cosine() 0 32 1
A dist_cosine() 0 32 1

How to fix   Duplicated Code   

Duplicated Code

Duplicate code is one of the most pungent code smells. A rule that is often used is to re-structure code once it is duplicated in three or more places.

Common duplication problems, and corresponding solutions are:

1
# -*- coding: utf-8 -*-
2
3
# Copyright 2014-2018 by Christopher C. Little.
4
# This file is part of Abydos.
5
#
6
# Abydos is free software: you can redistribute it and/or modify
7
# it under the terms of the GNU General Public License as published by
8
# the Free Software Foundation, either version 3 of the License, or
9
# (at your option) any later version.
10
#
11
# Abydos is distributed in the hope that it will be useful,
12
# but WITHOUT ANY WARRANTY; without even the implied warranty of
13
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
14
# GNU General Public License for more details.
15
#
16
# You should have received a copy of the GNU General Public License
17
# along with Abydos. If not, see <http://www.gnu.org/licenses/>.
18
19 1
"""abydos.distance._cosine.
20
21
Cosine similarity & distance
22
"""
23
24 1
from __future__ import (
25
    absolute_import,
26
    division,
27
    print_function,
28
    unicode_literals,
29
)
30
31 1
from math import sqrt
32
33 1
from ._token_distance import _TokenDistance
34
35 1
__all__ = ['Cosine', 'dist_cosine', 'sim_cosine']
36
37
38 1 View Code Duplication
class Cosine(_TokenDistance):
0 ignored issues
show
Unused Code introduced by
The variable __class__ seems to be unused.
Loading history...
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
39
    r"""Cosine similarity.
40
41
    For two sets X and Y, the cosine similarity, Otsuka-Ochiai coefficient, or
42
    Ochiai coefficient :cite:`Otsuka:1936,Ochiai:1957` is:
43
    :math:`sim_{cosine}(X, Y) = \frac{|X \cap Y|}{\sqrt{|X| \cdot |Y|}}`.
44
    """
45
46 1
    def sim(self, src, tar, qval=2):
0 ignored issues
show
Bug introduced by
Parameters differ from overridden 'sim' method
Loading history...
47
        r"""Return the cosine similarity of two strings.
48
49
        Parameters
50
        ----------
51
        src : str
52
            Source string (or QGrams/Counter objects) for comparison
53
        tar : str
54
            Target string (or QGrams/Counter objects) for comparison
55
        qval : int
56
            The length of each q-gram; 0 for non-q-gram version
57
58
        Returns
59
        -------
60
        float
61
            Cosine similarity
62
63
        Examples
64
        --------
65
        >>> cmp = Cosine()
66
        >>> cmp.sim('cat', 'hat')
67
        0.5
68
        >>> cmp.sim('Niall', 'Neil')
69
        0.3651483716701107
70
        >>> cmp.sim('aluminum', 'Catalan')
71
        0.11785113019775793
72
        >>> cmp.sim('ATCG', 'TAGC')
73
        0.0
74
75
        """
76 1
        if src == tar:
77 1
            return 1.0
78 1
        if not src or not tar:
79 1
            return 0.0
80
81 1
        q_src, q_tar = self._get_qgrams(src, tar, qval)
82 1
        q_src_mag = sum(q_src.values())
83 1
        q_tar_mag = sum(q_tar.values())
84 1
        q_intersection_mag = sum((q_src & q_tar).values())
85
86 1
        return q_intersection_mag / sqrt(q_src_mag * q_tar_mag)
87
88
89 1
def sim_cosine(src, tar, qval=2):
90
    r"""Return the cosine similarity of two strings.
91
92
    This is a wrapper for :py:meth:`Cosine.sim`.
93
94
    Parameters
95
    ----------
96
    src : str
97
        Source string (or QGrams/Counter objects) for comparison
98
    tar : str
99
        Target string (or QGrams/Counter objects) for comparison
100
    qval : int
101
        The length of each q-gram; 0 for non-q-gram version
102
103
    Returns
104
    -------
105
    float
106
        Cosine similarity
107
108
    Examples
109
    --------
110
    >>> sim_cosine('cat', 'hat')
111
    0.5
112
    >>> sim_cosine('Niall', 'Neil')
113
    0.3651483716701107
114
    >>> sim_cosine('aluminum', 'Catalan')
115
    0.11785113019775793
116
    >>> sim_cosine('ATCG', 'TAGC')
117
    0.0
118
119
    """
120 1
    return Cosine().sim(src, tar, qval)
121
122
123 1
def dist_cosine(src, tar, qval=2):
124
    """Return the cosine distance between two strings.
125
126
    This is a wrapper for :py:meth:`Cosine.dist`.
127
128
    Parameters
129
    ----------
130
    src : str
131
        Source string (or QGrams/Counter objects) for comparison
132
    tar : str
133
        Target string (or QGrams/Counter objects) for comparison
134
    qval : int
135
        The length of each q-gram; 0 for non-q-gram version
136
137
    Returns
138
    -------
139
    float
140
        Cosine distance
141
142
    Examples
143
    --------
144
    >>> dist_cosine('cat', 'hat')
145
    0.5
146
    >>> dist_cosine('Niall', 'Neil')
147
    0.6348516283298893
148
    >>> dist_cosine('aluminum', 'Catalan')
149
    0.882148869802242
150
    >>> dist_cosine('ATCG', 'TAGC')
151
    1.0
152
153
    """
154 1
    return Cosine().dist(src, tar, qval)
155
156
157
if __name__ == '__main__':
158
    import doctest
159
160
    doctest.testmod()
161