Completed
Push — youngblood/update_intermediate... ( 46ae62...676980 )
by
unknown
01:55
created

term_frequency()   A

Complexity

Conditions 1

Size

Total Lines 3

Duplication

Lines 0
Ratio 0 %
Metric Value
cc 1
dl 0
loc 3
rs 10
1
from collections import Counter
0 ignored issues
show
Coding Style introduced by
This module should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
2
import itertools
3
4
def _accumulate_terms(tokenized_corpus):
0 ignored issues
show
Coding Style introduced by
This function should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
5
    global_terms=set()
0 ignored issues
show
Coding Style introduced by
Exactly one space required around assignment
global_terms=set()
^
Loading history...
6
    document_term_counts = {}
7
    doc_lengths = {}
8
    global_term_frequency_counter = Counter()
9
    for doc_id, doc in tokenized_corpus:
10
        doc_terms = set(doc)
11
        global_terms.update(doc_terms)
12
        doc_lengths[doc_id] = len(doc)
13
        document_term_counts[doc_id] = len(doc_terms)
14
        global_term_frequency_counter.update(doc)
15
    id_term_map = {}
16
    global_term_frequency = {}
17
    for term_id, term in enumerate(global_terms):
18
        id_term_map[term_id] = term
19
        global_term_frequency[term_id] = global_term_frequency_counter[term]
20
21
    return id_term_map, document_term_counts, doc_lengths, global_term_frequency
22
23
24
class VectorizerOutput(object):
0 ignored issues
show
Coding Style introduced by
This class should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
25
    # TODO: replace the __iter__ of the TokenizerOutput with a get_tokens()?
0 ignored issues
show
Coding Style introduced by
TODO and FIXME comments should generally be avoided.
Loading history...
26
    def __init__(self, tokenized_corpus=None, vectorizer_func=None,
27
                 id_term_map=None, document_term_counts=None, doc_lengths=None,
28
                 term_frequency=None, vectors=None):
29
        if tokenized_corpus and vectorizer_func and not vectors:
30
            iter1, iter2 = itertools.tee(tokenized_corpus)
31
            self._id_term_map, self._document_term_counts, self._doc_lengths, self._term_frequency = _accumulate_terms(iter1)
32
            self._vectors = vectorizer_func(iter2, self)
33
        elif id_term_map and document_term_counts and doc_lengths and term_frequency and vectors:
34
            self._id_term_map = id_term_map
35
            self._document_term_counts = document_term_counts
36
            self._doc_lengths = doc_lengths
37
            self._term_frequency = term_frequency
38
            self._vectors = vectors
39
        else:
40
            raise ValueError("Must provide either tokenized corpora and vectorizer func, "
41
                             "or global term collection, document term counts, and vectors.")
42
43
    def get_vectors(self):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
44
        for doc_id, vector in self._vectors.items():
45
            yield doc_id, vector
46
47
    def __len__(self):
48
        return len(self._vectors)
49
50
    @property
51
    def id_term_map(self):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
52
        return self._id_term_map
53
54
    @property
55
    def term_id_map(self):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
56
        return {term: id for id, term in self._id_term_map.items()}
57
58
    @property
59
    def global_term_count(self):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
60
        return len(self.id_term_map)
61
62
    @property
63
    def document_term_counts(self):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
64
        return self._document_term_counts
65
66
    @property
67
    def doc_lengths(self):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
68
        return self._doc_lengths
69
70
    @property
71
    def term_frequency(self):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
72
        return self._term_frequency
73
74
    @property
75
    def vectors(self):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
76
        return self._vectors
77