abydos.distance._ncd_lzss.NCDlzss.dist()   A
last analyzed

Complexity

Conditions 3

Size

Total Lines 51
Code Lines 13

Duplication

Lines 51
Ratio 100 %

Code Coverage

Tests 1
CRAP Score 8.2077

Importance

Changes 0
Metric Value
eloc 13
dl 51
loc 51
ccs 1
cts 6
cp 0.1666
rs 9.75
c 0
b 0
f 0
cc 3
nop 3
crap 8.2077

How to fix   Long Method   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
# Copyright 2019-2020 by Christopher C. Little.
2
# This file is part of Abydos.
3
#
4
# Abydos is free software: you can redistribute it and/or modify
5
# it under the terms of the GNU General Public License as published by
6
# the Free Software Foundation, either version 3 of the License, or
7
# (at your option) any later version.
8
#
9
# Abydos is distributed in the hope that it will be useful,
10
# but WITHOUT ANY WARRANTY; without even the implied warranty of
11
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
12
# GNU General Public License for more details.
13
#
14
# You should have received a copy of the GNU General Public License
15
# along with Abydos. If not, see <http://www.gnu.org/licenses/>.
16
17
"""abydos.distance._ncd_lzss.
18
19 1
NCD using LZSS
20
"""
21
22
from ._distance import _Distance
23
24 1
try:
25
    import lzss
26
except ImportError:  # pragma: no cover
27
    # If the system lacks the lzss library, that's fine, but LZSS compression
28
    # similarity won't be supported.
29
    lzss = None  # type: ignore
30
31 1
__all__ = ['NCDlzss']
32
33 1
34 1 View Code Duplication
class NCDlzss(_Distance):
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
35
    """Normalized Compression Distance using LZSS compression.
36
37
    Cf. https://en.wikipedia.org/wiki/Lempel-Ziv-Storer-Szymanski
38
39
    Normalized compression distance (NCD) :cite:`Cilibrasi:2005`.
40 1
41
    .. versionadded:: 0.4.0
42
    """
43 1
44
    def dist(self, src: str, tar: str) -> float:
45
        """Return the NCD between two strings using LZSS compression.
46
47
        Parameters
48
        ----------
49
        src : str
50
            Source string for comparison
51
        tar : str
52
            Target string for comparison
53 1
54
        Returns
55
        -------
56
        float
57
            Compression distance
58
59
        Raises
60
        ------
61
        ValueError
62
            Install the PyLZSS module in order to use LZSS
63
64
        Examples
65
        --------
66
        >>> cmp = NCDlzss()
67
        >>> cmp.dist('cat', 'hat')
68
        0.75
69
        >>> cmp.dist('Niall', 'Neil')
70
        1.0
71
        >>> cmp.dist('aluminum', 'Catalan')
72
        1.0
73
        >>> cmp.dist('ATCG', 'TAGC')
74
        0.8
75
76
77
        .. versionadded:: 0.4.0
78
79
        """
80
        if src == tar:
81
            return 0.0
82
83
        if lzss is not None:
84
            src_comp = lzss.encode(src)
85
            tar_comp = lzss.encode(tar)
86
            concat_comp = lzss.encode(src + tar)
87
            concat_comp2 = lzss.encode(tar + src)
88
        else:  # pragma: no cover
89
            raise ValueError('Install the PyLZSS module in order to use LZSS')
90
91
        return (
92
            min(len(concat_comp), len(concat_comp2))
93
            - min(len(src_comp), len(tar_comp))
94
        ) / max(len(src_comp), len(tar_comp))
95
96
97
if __name__ == '__main__':
98
    import doctest
99
100
    doctest.testmod()
101