annif.analyzer.spacy - Code Metrics - Inspection of "Add spaCy analyzer" - NatLibFi/Annif - Measure and Improve Code Quality continuously with Scrutinizer

Passed

Pull Request — master (#527)

by Osma

created 2021-09-01 13:11 UTC

annif.analyzer.spacy A

↳ Parent: Project

Complexity

Total Complexity

Size/Duplication

Total Lines	31
Duplicated Lines	0 %

Importance

Changes

Metric	Value
eloc	22
dl	0
loc	31
rs	10
c	0
b	0
f	0
wmc	5

4 Methods

Rating	Name	Size	Complexity
A	SpacyAnalyzer.__init__()	6	1
A	SpacyAnalyzer.tokenize_words()	5	2
A	SpacyAnalyzer.normalize_word()	3	1
A	SpacyAnalyzer.tokenize_sentences()	3	1

"""Simple analyzer for Annif. Only folds words to lower case."""

import spacy
from spacy.tokens import Doc, Span
from . import analyzer


class SpacyAnalyzer(analyzer.Analyzer):
    name = "spacy"

    def __init__(self, param, **kwargs):
        self.param = param
        self.nlp = spacy.load(param, exclude=['ner', 'parser'])
        # we need a way to split sentences, now that parser is excluded
        self.nlp.add_pipe('sentencizer')
        super().__init__(**kwargs)

    def tokenize_sentences(self, text):
        doc = self.nlp(text)
        return list(doc.sents)

    def tokenize_words(self, text):
        if not isinstance(text, (Doc, Span)):
            text = self.nlp(text)
        return [lemma for lemma in (token.lemma_ for token in text)
                if self.is_valid_token(lemma)]

    def normalize_word(self, word):
        doc = self.nlp(word)
        return doc[:].lemma_


1			"""Simple analyzer for Annif. Only folds words to lower case."""
2
3			import spacy
4			from spacy.tokens import Doc, Span
5			from . import analyzer
6
7
8			class SpacyAnalyzer(analyzer.Analyzer):
9			name = "spacy"
10
11			def __init__(self, param, **kwargs):
12			self.param = param
13			self.nlp = spacy.load(param, exclude=['ner', 'parser'])
14			# we need a way to split sentences, now that parser is excluded
15			self.nlp.add_pipe('sentencizer')
16			super().__init__(**kwargs)
17
18			def tokenize_sentences(self, text):
19			doc = self.nlp(text)
20			return list(doc.sents)
21
22			def tokenize_words(self, text):
23			if not isinstance(text, (Doc, Span)):
24			text = self.nlp(text)
25			return [lemma for lemma in (token.lemma_ for token in text)
26			if self.is_valid_token(lemma)]
27
28			def normalize_word(self, word):
29			doc = self.nlp(word)
30			return doc[:].lemma_
31

NatLibFi / Annif

Pull Request — master (#527)

annif.analyzer.spacy A

Complexity

Size/Duplication

Importance

4 Methods

Duplication Side-by-Side

Filter issues like