Passed
Branch tfidf-backend (2ab86e)
by Osma
03:35
created

VectorCorpus.__init__()   A

Complexity

Conditions 1

Size

Total Lines 3

Duplication

Lines 0
Ratio 0 %

Importance

Changes 0
Metric Value
cc 1
c 0
b 0
f 0
dl 0
loc 3
rs 10
1
"""Backend that returns most similar subjects based on similarity in sparse
2
TF-IDF normalized bag-of-words vector space"""
3
4
import os
5
import os.path
6
import tempfile
7
import gensim.corpora
8
import gensim.models
9
from . import backend
10
11
12
class VectorCorpus:
1 ignored issue
show
Unused Code introduced by
The variable __class__ seems to be unused.
Loading history...
13
    """A class that wraps a text corpus so it can be iterated as lists of
14
    vectors, by using a dictionary to map words to integers."""
15
16
    def __init__(self, corpus, dictionary):
17
        self.corpus = corpus
18
        self.dictionary = dictionary
19
20
    def __iter__(self):
21
        for doc in self.corpus:
22
            yield self.dictionary.doc2bow(doc)
23
24
25
class TFIDFBackend(backend.AnnifBackend):
1 ignored issue
show
Coding Style introduced by
This class should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
Unused Code introduced by
The variable __class__ seems to be unused.
Loading history...
26
    name = "tfidf"
27
28
    def _atomic_save(self, obj, dirname, filename):
0 ignored issues
show
Coding Style introduced by
This method could be written as a function/class method.

If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example

class Foo:
    def some_method(self, x, y):
        return x + y;

could be written as

class Foo:
    @classmethod
    def some_method(cls, x, y):
        return x + y;
Loading history...
29
        tempfd, tempfilename = tempfile.mkstemp(prefix=filename, dir=dirname)
30
        os.close(tempfd)
31
        obj.save(tempfilename)
32
        os.rename(tempfilename, os.path.join(dirname, filename))
33
34
    def load_subjects(self, subjects, analyzer):
35
        corpus = subjects.tokens(analyzer)
36
        dictionary = gensim.corpora.Dictionary(corpus)
37
        self._atomic_save(dictionary, self._get_datadir(), 'dictionary')
38
        veccorpus = VectorCorpus(corpus, dictionary)
39
        tfidf = gensim.models.TfidfModel(veccorpus)
40
        self._atomic_save(tfidf, self._get_datadir(), 'tfidf')
41
42
    def analyze(self, text):
43
        return []  # TODO
0 ignored issues
show
Coding Style introduced by
TODO and FIXME comments should generally be avoided.
Loading history...
44