annif.analyzer.spacy - Code Metrics - Inspection of "Add spaCy analyzer" - NatLibFi/Annif - Measure and Improve Code Quality continuously with Scrutinizer

Passed

Pull Request — master (#527)

by Osma

created 2022-01-19 08:38 UTC

annif.analyzer.spacy A

↳ Parent: Project

Complexity

Total Complexity

Size/Duplication

Total Lines	39
Duplicated Lines	74.36 %

Importance

Changes

Metric	Value
eloc	28
dl	29
loc	39
rs	10
c	0
b	0
f	0
wmc	6

3 Methods

Rating	Name	Duplication	Size	Complexity
A	SpacyAnalyzer.__init__()	8	8	2
A	SpacyAnalyzer.tokenize_words()	9	9	2
A	SpacyAnalyzer.normalize_word()	7	7	2

How to fix Duplicated Code

"""Simple analyzer for Annif. Only folds words to lower case."""

import spacy
from . import analyzer
import annif.util

_KEY_LOWERCASE = 'lowercase'


class SpacyAnalyzer(analyzer.Analyzer):

    name = "spacy"

    def __init__(self, param, **kwargs):
        self.param = param
        self.nlp = spacy.load(param, exclude=['ner', 'parser'])
        if _KEY_LOWERCASE in kwargs:
            self.lowercase = annif.util.boolean(kwargs[_KEY_LOWERCASE])
        else:
            self.lowercase = False
        super().__init__(**kwargs)

    def tokenize_words(self, text, filter=True):
        lemmas = [lemma
                  for lemma in (token.lemma_
                                for token in self.nlp(text.strip()))
                  if (not filter or self.is_valid_token(lemma))]
        if self.lowercase:
            return [lemma.lower() for lemma in lemmas]
        else:
            return lemmas

    def normalize_word(self, word):
        doc = self.nlp(word)
        lemma = doc[:].lemma_
        if self.lowercase:
            return lemma.lower()
        else:
            return lemma


1		"""Simple analyzer for Annif. Only folds words to lower case."""
2
3		import spacy
4		from . import analyzer
5		import annif.util
6
7		_KEY_LOWERCASE = 'lowercase'
8
9
10	View Code Duplication	class SpacyAnalyzer(analyzer.Analyzer):
		0 ignored issues – show Duplication introduced 2022-01-18 14:32 UTC by Report Bug Copy Issue Report This code seems to be duplicated in your project. Loading history...
11		name = "spacy"
12
13		def __init__(self, param, **kwargs):
14		self.param = param
15		self.nlp = spacy.load(param, exclude=['ner', 'parser'])
16		if _KEY_LOWERCASE in kwargs:
17		self.lowercase = annif.util.boolean(kwargs[_KEY_LOWERCASE])
18		else:
19		self.lowercase = False
20		super().__init__(**kwargs)
21
22		def tokenize_words(self, text, filter=True):
23		lemmas = [lemma
24		for lemma in (token.lemma_
25		for token in self.nlp(text.strip()))
26		if (not filter or self.is_valid_token(lemma))]
27		if self.lowercase:
28		return [lemma.lower() for lemma in lemmas]
29		else:
30		return lemmas
31
32		def normalize_word(self, word):
33		doc = self.nlp(word)
34		lemma = doc[:].lemma_
35		if self.lowercase:
36		return lemma.lower()
37		else:
38		return lemma
39

NatLibFi / Annif

Pull Request — master (#527)

annif.analyzer.spacy A

Complexity

Size/Duplication

Importance

3 Methods

How to fix Duplicated Code

Duplicated Code

Duplication Side-by-Side

Filter issues like