TFIDFBackend - Code Metrics - Inspection of "Create a TF-IDF model. Part of #46" - NatLibFi/Annif - Measure and Improve Code Quality continuously with Scrutinizer

Passed

Branch — tfidf-backend (2ab86e)

by Osma

created 2018-03-20 07:42 UTC

TFIDFBackend A

↳ Parent: Project

Complexity

Total Complexity

Size/Duplication

Total Lines	19
Duplicated Lines	0 %

Importance

Changes	2
Bugs	0	Features	0

Metric	Value
c	2
b	0
f	0
dl	0
loc	19
rs	10
wmc	3

3 Methods

Rating	Name	Size	Complexity
A	analyze()	2	1
A	load_subjects()	7	1
A	_atomic_save()	5	1

"""Backend that returns most similar subjects based on similarity in sparse
TF-IDF normalized bag-of-words vector space"""

import os
import os.path
import tempfile
import gensim.corpora
import gensim.models
from . import backend


class VectorCorpus:

    """A class that wraps a text corpus so it can be iterated as lists of
    vectors, by using a dictionary to map words to integers."""

    def __init__(self, corpus, dictionary):
        self.corpus = corpus
        self.dictionary = dictionary

    def __iter__(self):
        for doc in self.corpus:
            yield self.dictionary.doc2bow(doc)


class TFIDFBackend(backend.AnnifBackend):
class SomeClass:
    def some_method(self):
        """Do x and return foo."""
    name = "tfidf"

    def _atomic_save(self, obj, dirname, filename):
class Foo:
    def some_method(self, x, y):
        return x + y;
        tempfd, tempfilename = tempfile.mkstemp(prefix=filename, dir=dirname)
        os.close(tempfd)
        obj.save(tempfilename)
        os.rename(tempfilename, os.path.join(dirname, filename))

    def load_subjects(self, subjects, analyzer):
        corpus = subjects.tokens(analyzer)
        dictionary = gensim.corpora.Dictionary(corpus)
        self._atomic_save(dictionary, self._get_datadir(), 'dictionary')
        veccorpus = VectorCorpus(corpus, dictionary)
        tfidf = gensim.models.TfidfModel(veccorpus)
        self._atomic_save(tfidf, self._get_datadir(), 'tfidf')

    def analyze(self, text):
        return []  # TODO



1			"""Backend that returns most similar subjects based on similarity in sparse
2			TF-IDF normalized bag-of-words vector space"""
3
4			import os
5			import os.path
6			import tempfile
7			import gensim.corpora
8			import gensim.models
9			from . import backend
10
11
12			class VectorCorpus:
			1 ignored issue – show Unused Code introduced 2018-03-20 07:45 UTC by Report Bug Copy Issue Report The variable `__class__` seems to be unused. Loading history...
13			"""A class that wraps a text corpus so it can be iterated as lists of
14			vectors, by using a dictionary to map words to integers."""
15
16			def __init__(self, corpus, dictionary):
17			self.corpus = corpus
18			self.dictionary = dictionary
19
20			def __iter__(self):
21			for doc in self.corpus:
22			yield self.dictionary.doc2bow(doc)
23
24
25			class TFIDFBackend(backend.AnnifBackend):
			1 ignored issue – show Coding Style introduced 2018-03-20 07:45 UTC by Report Bug Copy Issue Report This class should have a docstring. The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods: class SomeClass: def some_method(self): """Do x and return foo.""" If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions. Loading history... Unused Code introduced 2018-03-20 07:45 UTC by Report Bug Copy Issue Report The variable `__class__` seems to be unused. Loading history...
26			name = "tfidf"
27
28			def _atomic_save(self, obj, dirname, filename):
			0 ignored issues – show Coding Style introduced 2018-03-20 07:45 UTC by Report Bug Copy Issue Report This method could be written as a function/class method. If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example class Foo: def some_method(self, x, y): return x + y; could be written as class Foo: @classmethod def some_method(cls, x, y): return x + y; Loading history...
29			tempfd, tempfilename = tempfile.mkstemp(prefix=filename, dir=dirname)
30			os.close(tempfd)
31			obj.save(tempfilename)
32			os.rename(tempfilename, os.path.join(dirname, filename))
33
34			def load_subjects(self, subjects, analyzer):
35			corpus = subjects.tokens(analyzer)
36			dictionary = gensim.corpora.Dictionary(corpus)
37			self._atomic_save(dictionary, self._get_datadir(), 'dictionary')
38			veccorpus = VectorCorpus(corpus, dictionary)
39			tfidf = gensim.models.TfidfModel(veccorpus)
40			self._atomic_save(tfidf, self._get_datadir(), 'tfidf')
41
42			def analyze(self, text):
43			return [] # TODO
			0 ignored issues – show Coding Style introduced 2018-03-20 07:45 UTC by Report Bug Copy Issue Report `TODO` and `FIXME` comments should generally be avoided. Loading history...
44

NatLibFi / Annif

Branch — tfidf-backend (2ab86e)

TFIDFBackend A

Complexity

Size/Duplication

Importance

3 Methods

Duplication Side-by-Side

Filter issues like