Completed
Push — youngblood/update_intermediate... ( 7dbc86...ee8400 )
by
unknown
01:39
created

topik.vectorizers.VectorizerOutput.__init__()   D

Complexity

Conditions 11

Size

Total Lines 22

Duplication

Lines 0
Ratio 0 %
Metric Value
cc 11
dl 0
loc 22
rs 4.0714

How to fix   Complexity   

Complexity

Complex classes like topik.vectorizers.VectorizerOutput.__init__() often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

1
from collections import Counter
0 ignored issues
show
Coding Style introduced by
This module should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
2
import itertools
3
4
def _accumulate_terms(tokenized_corpus):
0 ignored issues
show
Coding Style introduced by
This function should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
5
    global_terms=set()
0 ignored issues
show
Coding Style introduced by
Exactly one space required around assignment
global_terms=set()
^
Loading history...
6
    document_term_counts = {}
7
    doc_lengths = {}
8
    global_term_frequency_counter = Counter()
9
    for doc_id, doc in tokenized_corpus:
10
        doc_terms = set(doc)
11
        global_terms.update(doc_terms)
12
        doc_lengths[doc_id] = len(doc)
13
        document_term_counts[doc_id] = len(doc_terms)
14
        global_term_frequency_counter.update(doc)
15
    id_term_map = {}
16
    global_term_frequency = {}
17
    for term_id, term in enumerate(global_terms):
18
        id_term_map[term_id] = term
19
        global_term_frequency[term_id] = global_term_frequency_counter[term]
20
21
    return id_term_map, document_term_counts, doc_lengths, global_term_frequency
22
23
24
class VectorizerOutput(object):
0 ignored issues
show
Coding Style introduced by
This class should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
25
    # TODO: replace the __iter__ of the TokenizerOutput with a get_tokens()?
0 ignored issues
show
Coding Style introduced by
TODO and FIXME comments should generally be avoided.
Loading history...
26
    def __init__(self, tokenized_corpus=None, vectorizer_func=None,
27
                 id_term_map=None, document_term_counts=None, doc_lengths=None,
28
                 term_frequency=None, vectors=None):
29
        if tokenized_corpus and vectorizer_func and not vectors:
30
            iter1, iter2 = itertools.tee(tokenized_corpus)
31
            self._id_term_map, self._document_term_counts, self._doc_lengths, \
32
                self._term_frequency = _accumulate_terms(iter1)
33
            self._term_id_map = {term: id
34
                                 for id, term in self._id_term_map.items()}
35
            self._vectors = vectorizer_func(iter2, self)
36
        elif id_term_map and document_term_counts and doc_lengths and \
37
                term_frequency and vectors:
38
            self._id_term_map = id_term_map
39
            self._term_id_map = {term: id
40
                                 for id, term in self._id_term_map.items()}
41
            self._document_term_counts = document_term_counts
42
            self._doc_lengths = doc_lengths
43
            self._term_frequency = term_frequency
44
            self._vectors = vectors
45
        else:
46
            raise ValueError(
47
                "Must provide either tokenized corpora and vectorizer func, "
48
                "or global term collection, document term counts, and vectors.")
49
50
    def get_vectors(self):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
51
        for doc_id, vector in self._vectors.items():
52
            yield doc_id, vector
53
54
    def __len__(self):
55
        return len(self._vectors)
56
57
    @property
58
    def id_term_map(self):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
59
        return self._id_term_map
60
61
    @property
62
    def term_id_map(self):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
63
        return self._term_id_map
64
65
    @property
66
    def global_term_count(self):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
67
        return len(self.id_term_map)
68
69
    @property
70
    def document_term_counts(self):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
71
        return self._document_term_counts
72
73
    @property
74
    def doc_lengths(self):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
75
        return self._doc_lengths
76
77
    @property
78
    def term_frequency(self):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
79
        return self._term_frequency
80
81
    @property
82
    def vectors(self):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
83
        return self._vectors
84