Passed
Pull Request — master (#527)
by Osma
03:11
created

annif.analyzer.spacy   A

Complexity

Total Complexity 6

Size/Duplication

Total Lines 37
Duplicated Lines 72.97 %

Importance

Changes 0
Metric Value
eloc 26
dl 27
loc 37
rs 10
c 0
b 0
f 0
wmc 6

3 Methods

Rating   Name   Duplication   Size   Complexity  
A SpacyAnalyzer.__init__() 8 8 2
A SpacyAnalyzer.tokenize_words() 7 7 2
A SpacyAnalyzer.normalize_word() 7 7 2

How to fix   Duplicated Code   

Duplicated Code

Duplicate code is one of the most pungent code smells. A rule that is often used is to re-structure code once it is duplicated in three or more places.

Common duplication problems, and corresponding solutions are:

1
"""Simple analyzer for Annif. Only folds words to lower case."""
2
3
import spacy
4
from . import analyzer
5
import annif.util
6
7
_KEY_LOWERCASE = 'lowercase'
8
9
10 View Code Duplication
class SpacyAnalyzer(analyzer.Analyzer):
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
11
    name = "spacy"
12
13
    def __init__(self, param, **kwargs):
14
        self.param = param
15
        self.nlp = spacy.load(param, exclude=['ner', 'parser'])
16
        if _KEY_LOWERCASE in kwargs:
17
            self.lowercase = annif.util.boolean(kwargs[_KEY_LOWERCASE])
18
        else:
19
            self.lowercase = False
20
        super().__init__(**kwargs)
21
22
    def tokenize_words(self, text):
23
        lemmas = [lemma for lemma in (token.lemma_ for token in self.nlp(text))
24
                  if self.is_valid_token(lemma)]
25
        if self.lowercase:
26
            return [lemma.lower() for lemma in lemmas]
27
        else:
28
            return lemmas
29
30
    def normalize_word(self, word):
31
        doc = self.nlp(word)
32
        lemma = doc[:].lemma_
33
        if self.lowercase:
34
            return lemma.lower()
35
        else:
36
            return lemma
37