abydos.distance._ncd_lzss   A
last analyzed

Complexity

Total Complexity 3

Size/Duplication

Total Lines 101
Duplicated Lines 60.4 %

Test Coverage

Coverage 50%

Importance

Changes 0
Metric Value
wmc 3
eloc 24
dl 61
loc 101
ccs 8
cts 16
cp 0.5
rs 10
c 0
b 0
f 0

1 Method

Rating   Name   Duplication   Size   Complexity  
A NCDlzss.dist() 51 51 3

How to fix   Duplicated Code   

Duplicated Code

Duplicate code is one of the most pungent code smells. A rule that is often used is to re-structure code once it is duplicated in three or more places.

Common duplication problems, and corresponding solutions are:

1
# Copyright 2019-2020 by Christopher C. Little.
2
# This file is part of Abydos.
3
#
4
# Abydos is free software: you can redistribute it and/or modify
5
# it under the terms of the GNU General Public License as published by
6
# the Free Software Foundation, either version 3 of the License, or
7
# (at your option) any later version.
8
#
9
# Abydos is distributed in the hope that it will be useful,
10
# but WITHOUT ANY WARRANTY; without even the implied warranty of
11
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
12
# GNU General Public License for more details.
13
#
14
# You should have received a copy of the GNU General Public License
15
# along with Abydos. If not, see <http://www.gnu.org/licenses/>.
16
17
"""abydos.distance._ncd_lzss.
18
19 1
NCD using LZSS
20
"""
21
22
from ._distance import _Distance
23
24 1
try:
25
    import lzss
26
except ImportError:  # pragma: no cover
27
    # If the system lacks the lzss library, that's fine, but LZSS compression
28
    # similarity won't be supported.
29
    lzss = None  # type: ignore
30
31 1
__all__ = ['NCDlzss']
32
33 1
34 1 View Code Duplication
class NCDlzss(_Distance):
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
35
    """Normalized Compression Distance using LZSS compression.
36
37
    Cf. https://en.wikipedia.org/wiki/Lempel-Ziv-Storer-Szymanski
38
39
    Normalized compression distance (NCD) :cite:`Cilibrasi:2005`.
40 1
41
    .. versionadded:: 0.4.0
42
    """
43 1
44
    def dist(self, src: str, tar: str) -> float:
45
        """Return the NCD between two strings using LZSS compression.
46
47
        Parameters
48
        ----------
49
        src : str
50
            Source string for comparison
51
        tar : str
52
            Target string for comparison
53 1
54
        Returns
55
        -------
56
        float
57
            Compression distance
58
59
        Raises
60
        ------
61
        ValueError
62
            Install the PyLZSS module in order to use LZSS
63
64
        Examples
65
        --------
66
        >>> cmp = NCDlzss()
67
        >>> cmp.dist('cat', 'hat')
68
        0.75
69
        >>> cmp.dist('Niall', 'Neil')
70
        1.0
71
        >>> cmp.dist('aluminum', 'Catalan')
72
        1.0
73
        >>> cmp.dist('ATCG', 'TAGC')
74
        0.8
75
76
77
        .. versionadded:: 0.4.0
78
79
        """
80
        if src == tar:
81
            return 0.0
82
83
        if lzss is not None:
84
            src_comp = lzss.encode(src)
85
            tar_comp = lzss.encode(tar)
86
            concat_comp = lzss.encode(src + tar)
87
            concat_comp2 = lzss.encode(tar + src)
88
        else:  # pragma: no cover
89
            raise ValueError('Install the PyLZSS module in order to use LZSS')
90
91
        return (
92
            min(len(concat_comp), len(concat_comp2))
93
            - min(len(src_comp), len(tar_comp))
94
        ) / max(len(src_comp), len(tar_comp))
95
96
97
if __name__ == '__main__':
98
    import doctest
99
100
    doctest.testmod()
101