VectorizerOutput.__init__()   D
last analyzed

Complexity

Conditions 11

Size

Total Lines 21

Duplication

Lines 0
Ratio 0 %

Importance

Changes 6
Bugs 3 Features 0
Metric Value
c 6
b 3
f 0
dl 0
loc 21
rs 4.7532
cc 11

How to fix   Complexity   

Complexity

Complex classes like VectorizerOutput.__init__() often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

1
from collections import Counter
0 ignored issues
show
Coding Style introduced by
This module should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
2
import itertools
3
4
def _accumulate_terms(tokenized_corpus):
0 ignored issues
show
Coding Style introduced by
This function should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
5
    global_terms=set()
0 ignored issues
show
Coding Style introduced by
Exactly one space required around assignment
global_terms=set()
^
Loading history...
6
    document_term_counts = {}
7
    doc_lengths = {}
8
    global_term_frequency_counter = Counter()
9
    for doc_id, doc in tokenized_corpus:
10
        doc_terms = set(doc)
11
        global_terms.update(doc_terms)
12
        doc_lengths[doc_id] = len(doc)
13
        document_term_counts[doc_id] = len(doc_terms)
14
        global_term_frequency_counter.update(doc)
15
    id_term_map = {}
16
    global_term_frequency = {}
17
    for term_id, term in enumerate(global_terms):
18
        id_term_map[term_id] = term
19
        global_term_frequency[term_id] = global_term_frequency_counter[term]
20
21
    return id_term_map, document_term_counts, doc_lengths, global_term_frequency
22
23
24
class VectorizerOutput(object):
0 ignored issues
show
Coding Style introduced by
This class should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
Comprehensibility Best Practice introduced by
The variable object does not seem to be defined.
Loading history...
25
    def __init__(self, tokenized_corpus=None, vectorizer_func=None,
26
                 id_term_map=None, document_term_counts=None, doc_lengths=None,
27
                 term_frequency=None, vectors=None):
28
        if tokenized_corpus and vectorizer_func and not vectors:
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable vectors does not seem to be defined.
Loading history...
Comprehensibility Best Practice introduced by
The variable tokenized_corpus does not seem to be defined.
Loading history...
Comprehensibility Best Practice introduced by
The variable vectorizer_func does not seem to be defined.
Loading history...
29
            iter1, iter2 = itertools.tee(tokenized_corpus)
30
            self._id_term_map, self._document_term_counts, self._doc_lengths, \
31
                self._term_frequency = _accumulate_terms(iter1)
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable iter1 does not seem to be defined.
Loading history...
32
            self._term_id_map = {term: id
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable id does not seem to be defined.
Loading history...
Comprehensibility Best Practice introduced by
The variable term does not seem to be defined.
Loading history...
33
                                 for id, term in self._id_term_map.items()}
34
            self._vectors = vectorizer_func(iter2, self)
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable self does not seem to be defined.
Loading history...
Comprehensibility Best Practice introduced by
The variable iter2 does not seem to be defined.
Loading history...
35
        elif id_term_map and document_term_counts and doc_lengths and \
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable id_term_map does not seem to be defined.
Loading history...
Comprehensibility Best Practice introduced by
The variable doc_lengths does not seem to be defined.
Loading history...
Comprehensibility Best Practice introduced by
The variable document_term_counts does not seem to be defined.
Loading history...
36
                term_frequency and vectors:
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable term_frequency does not seem to be defined.
Loading history...
37
            self._id_term_map = id_term_map
38
            self._term_id_map = {term: id for id, term in self._id_term_map.items()}
39
            self._document_term_counts = document_term_counts
40
            self._doc_lengths = doc_lengths
41
            self._term_frequency = term_frequency
42
            self._vectors = vectors
43
        else:
44
            raise ValueError(
45
                "Must provide either tokenized corpora and vectorizer func, "
46
                "or global term collection, document term counts, and vectors.")
47
48
    def get_vectors(self):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
49
        for doc_id, vector in self._vectors.items():
50
            yield doc_id, vector
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable vector does not seem to be defined.
Loading history...
Comprehensibility Best Practice introduced by
The variable doc_id does not seem to be defined.
Loading history...
51
52
    def __len__(self):
53
        return len(self._vectors)
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable self does not seem to be defined.
Loading history...
54
55
    @property
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable property does not seem to be defined.
Loading history...
56
    def id_term_map(self):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
57
        return self._id_term_map
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable self does not seem to be defined.
Loading history...
58
59
    @property
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable property does not seem to be defined.
Loading history...
60
    def term_id_map(self):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
61
        return self._term_id_map
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable self does not seem to be defined.
Loading history...
62
63
    @property
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable property does not seem to be defined.
Loading history...
64
    def global_term_count(self):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
65
        return len(self.id_term_map)
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable self does not seem to be defined.
Loading history...
66
67
    @property
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable property does not seem to be defined.
Loading history...
68
    def document_term_counts(self):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
69
        return self._document_term_counts
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable self does not seem to be defined.
Loading history...
70
71
    @property
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable property does not seem to be defined.
Loading history...
72
    def doc_lengths(self):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
73
        return self._doc_lengths
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable self does not seem to be defined.
Loading history...
74
75
    @property
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable property does not seem to be defined.
Loading history...
76
    def term_frequency(self):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
77
        return self._term_frequency
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable self does not seem to be defined.
Loading history...
78
79
    @property
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable property does not seem to be defined.
Loading history...
80
    def vectors(self):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
81
        return self._vectors
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable self does not seem to be defined.
Loading history...
82