ModeledElasticCorpora   A
last analyzed

Complexity

Total Complexity 8

Size/Duplication

Total Lines 26
Duplicated Lines 100 %
Metric Value
wmc 8
dl 26
loc 26
rs 10

3 Methods

Rating   Name   Duplication   Size   Complexity  
B __getitem__() 15 15 6
A __setitem__() 6 6 1
A __lt__() 2 2 1

How to fix   Duplicated Code   

Duplicated Code

Duplicate code is one of the most pungent code smells. A rule that is often used is to re-structure code once it is duplicated in three or more places.

Common duplication problems, and corresponding solutions are:

1
from six.moves import UserDict
0 ignored issues
show
Coding Style introduced by
This module should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
Configuration introduced by
The import six.moves could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
2
import logging
3
import time
4
5
from elasticsearch import Elasticsearch, helpers
6
7
from ._registry import register_output
8
from .base_output import OutputInterface
9
from topik.vectorizers.vectorizer_output import VectorizerOutput
10
from topik.models.base_model_output import ModelOutput
11
12
def es_setitem(key, value, doc_type, instance, index, batch_size=1000):
13
    """load an iterable of (id, value) pairs to the specified new or
14
           new or existing field within existing documents."""
15
    batch = []
16
    for id, val in value:
0 ignored issues
show
Bug Best Practice introduced by
This seems to re-define the built-in id.

It is generally discouraged to redefine built-ins as this makes code very hard to read.

Loading history...
Coding Style Naming introduced by
The name id does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
17
        action = {'_op_type': 'update',
18
                  '_index': index,
19
                  '_type': doc_type,
20
                  '_id': id,
21
                  'doc': {key: val},
22
                  'doc_as_upsert': "true",
23
                  }
24
        batch.append(action)
25
        if len(batch) >= batch_size:
26
            helpers.bulk(client=instance, actions=batch,
27
                         index=index)
28
            batch = []
29
    if batch:
30
        helpers.bulk(client=instance, actions=batch, index=index)
31
    instance.indices.refresh(index)
32
33
def es_getitem(key, doc_type, instance, index, query=None):
0 ignored issues
show
Coding Style introduced by
This function should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
34
    results = helpers.scan(instance, index=index,
35
                               query=query, doc_type=doc_type)
36
    for result in results:
37
        try:
38
            id = int(result["_id"])
0 ignored issues
show
Bug Best Practice introduced by
This seems to re-define the built-in id.

It is generally discouraged to redefine built-ins as this makes code very hard to read.

Loading history...
Coding Style Naming introduced by
The name id does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
39
        except ValueError:
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable ValueError does not seem to be defined.
Loading history...
40
            id = result["_id"]
0 ignored issues
show
Coding Style Naming introduced by
The name id does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
41
        yield id, result['_source'][key]
42
43
class BaseElasticCorpora(UserDict):
0 ignored issues
show
Coding Style introduced by
This class should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
Comprehensibility Best Practice introduced by
The variable UserDict does not seem to be defined.
Loading history...
44
    def __init__(self, instance, index, corpus_type, query=None,
45
                 batch_size=1000):
46
        self.instance = instance
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable instance does not seem to be defined.
Loading history...
47
        self.index = index
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable index does not seem to be defined.
Loading history...
48
        self.corpus_type = corpus_type
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable corpus_type does not seem to be defined.
Loading history...
49
        self.query = query
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable query does not seem to be defined.
Loading history...
50
        self.batch_size = batch_size
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable batch_size does not seem to be defined.
Loading history...
51
        pass
0 ignored issues
show
Unused Code introduced by
Unnecessary pass statement
Loading history...
52
53
    def __setitem__(self, key, value):
54
        es_setitem(key, value, self.corpus_type, self.instance, self.index)
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable self does not seem to be defined.
Loading history...
Comprehensibility Best Practice introduced by
The variable key does not seem to be defined.
Loading history...
Comprehensibility Best Practice introduced by
The variable value does not seem to be defined.
Loading history...
55
56
57
    def __getitem__(self, key):
58
        return es_getitem(key,self.corpus_type,self.instance,self.index,
0 ignored issues
show
Coding Style introduced by
Exactly one space required after comma
return es_getitem(key,self.corpus_type,self.instance,self.index,
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
return es_getitem(key,self.corpus_type,self.instance,self.index,
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
return es_getitem(key,self.corpus_type,self.instance,self.index,
^
Loading history...
Comprehensibility Best Practice introduced by
The variable key does not seem to be defined.
Loading history...
Comprehensibility Best Practice introduced by
The variable self does not seem to be defined.
Loading history...
59
                          self.query)
60
61 View Code Duplication
class VectorizedElasticCorpora(BaseElasticCorpora):
0 ignored issues
show
Coding Style introduced by
This class should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
Comprehensibility Best Practice introduced by
The variable BaseElasticCorpora does not seem to be defined.
Loading history...
62
    def __setitem__(self, key, value):
63
        #id_term_map
64
        es_setitem(key,value.id_term_map.items(),"term",self.instance,self.index)
0 ignored issues
show
Coding Style introduced by
Exactly one space required after comma
es_setitem(key,value.id_term_map.items(),"term",self.instance,self.index)
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
es_setitem(key,value.id_term_map.items(),"term",self.instance,self.index)
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
es_setitem(key,value.id_term_map.items(),"term",self.instance,self.index)
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
es_setitem(key,value.id_term_map.items(),"term",self.instance,self.index)
^
Loading history...
Comprehensibility Best Practice introduced by
The variable key does not seem to be defined.
Loading history...
Comprehensibility Best Practice introduced by
The variable self does not seem to be defined.
Loading history...
65
        #document_term_counts
66
        es_setitem(key,value.document_term_counts.items(),"document_term_count",self.instance,self.index)
0 ignored issues
show
Coding Style introduced by
Exactly one space required after comma
es_setitem(key,value.document_term_counts.items(),"document_term_count",self.instance,self.index)
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
es_setitem(key,value.document_term_counts.items(),"document_term_count",self.instance,self.index)
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
es_setitem(key,value.document_term_counts.items(),"document_term_count",self.instance,self.index)
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
es_setitem(key,value.document_term_counts.items(),"document_term_count",self.instance,self.index)
^
Loading history...
67
        #doc_lengths
68
        es_setitem(key,value.doc_lengths.items(),"document_length",self.instance,self.index)
0 ignored issues
show
Coding Style introduced by
Exactly one space required after comma
es_setitem(key,value.doc_lengths.items(),"document_length",self.instance,self.index)
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
es_setitem(key,value.doc_lengths.items(),"document_length",self.instance,self.index)
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
es_setitem(key,value.doc_lengths.items(),"document_length",self.instance,self.index)
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
es_setitem(key,value.doc_lengths.items(),"document_length",self.instance,self.index)
^
Loading history...
69
        #global term_frequency
70
        es_setitem(key,value.term_frequency.items(),"term_frequency",self.instance,self.index)
0 ignored issues
show
Coding Style introduced by
Exactly one space required after comma
es_setitem(key,value.term_frequency.items(),"term_frequency",self.instance,self.index)
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
es_setitem(key,value.term_frequency.items(),"term_frequency",self.instance,self.index)
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
es_setitem(key,value.term_frequency.items(),"term_frequency",self.instance,self.index)
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
es_setitem(key,value.term_frequency.items(),"term_frequency",self.instance,self.index)
^
Loading history...
71
        #vectors
72
        es_setitem(key,value.vectors.items(),"vector",self.instance,self.index)
0 ignored issues
show
Coding Style introduced by
Exactly one space required after comma
es_setitem(key,value.vectors.items(),"vector",self.instance,self.index)
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
es_setitem(key,value.vectors.items(),"vector",self.instance,self.index)
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
es_setitem(key,value.vectors.items(),"vector",self.instance,self.index)
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
es_setitem(key,value.vectors.items(),"vector",self.instance,self.index)
^
Loading history...
73
        # could either upload vectors explicitly here (above) or using Super (below)
74
        #super(VectorizedElasticCorpora, self).__setitem__(key, value)
75
76
    def __getitem__(self, key):
77
        # TODO: each of these should be retrieved from a query.  Populate the VectorizerOutput object
0 ignored issues
show
Coding Style introduced by
TODO and FIXME comments should generally be avoided.
Loading history...
78
        # and return it.  These things can be iterators instead of dicts; VectorizerOutput should
79
        # not care.
80
        # TODO: this is the id->term map for the full set of unique terms across all docs
0 ignored issues
show
Coding Style introduced by
TODO and FIXME comments should generally be avoided.
Loading history...
81
        id_term_map = {int(term_id): term for term_id, term in es_getitem(key,"term",self.instance,self.index,self.query)}
0 ignored issues
show
Coding Style introduced by
Exactly one space required after comma
id_term_map = int(term_id): term for term_id, term in es_getitem(key,"term",self.instance,self.index,self.query)
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
id_term_map = int(term_id): term for term_id, term in es_getitem(key,"term",self.instance,self.index,self.query)
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
id_term_map = int(term_id): term for term_id, term in es_getitem(key,"term",self.instance,self.index,self.query)
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
id_term_map = int(term_id): term for term_id, term in es_getitem(key,"term",self.instance,self.index,self.query)
^
Loading history...
Comprehensibility Best Practice introduced by
The variable term does not seem to be defined.
Loading history...
Comprehensibility Best Practice introduced by
The variable term_id does not seem to be defined.
Loading history...
Comprehensibility Best Practice introduced by
The variable self does not seem to be defined.
Loading history...
Comprehensibility Best Practice introduced by
The variable key does not seem to be defined.
Loading history...
82
        # 15
83
        # TODO: this is the count of terms associated with each document
0 ignored issues
show
Coding Style introduced by
TODO and FIXME comments should generally be avoided.
Loading history...
84
        document_term_count = {int(doc_id): doc_term_count for doc_id, doc_term_count in es_getitem(key,"document_term_count",self.instance,self.index,self.query)}
0 ignored issues
show
Coding Style introduced by
Exactly one space required after comma
document_term_count = int(doc_id): doc_term_count for doc_id, doc_term_count in es_getitem(key,"document_term_count",self.instance,self.index,self.query)
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
document_term_count = int(doc_id): doc_term_count for doc_id, doc_term_count in es_getitem(key,"document_term_count",self.instance,self.index,self.query)
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
document_term_count = int(doc_id): doc_term_count for doc_id, doc_term_count in es_getitem(key,"document_term_count",self.instance,self.index,self.query)
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
document_term_count = int(doc_id): doc_term_count for doc_id, doc_term_count in es_getitem(key,"document_term_count",self.instance,self.index,self.query)
^
Loading history...
Comprehensibility Best Practice introduced by
The variable doc_id does not seem to be defined.
Loading history...
Comprehensibility Best Practice introduced by
The variable doc_term_count does not seem to be defined.
Loading history...
85
        # {"doc1": 3, "doc2": 5}
86
        doc_lengths = {int(doc_id): doc_length for doc_id, doc_length in es_getitem(key,"document_length",self.instance,self.index,self.query)}
0 ignored issues
show
Coding Style introduced by
Exactly one space required after comma
doc_lengths = int(doc_id): doc_length for doc_id, doc_length in es_getitem(key,"document_length",self.instance,self.index,self.query)
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
doc_lengths = int(doc_id): doc_length for doc_id, doc_length in es_getitem(key,"document_length",self.instance,self.index,self.query)
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
doc_lengths = int(doc_id): doc_length for doc_id, doc_length in es_getitem(key,"document_length",self.instance,self.index,self.query)
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
doc_lengths = int(doc_id): doc_length for doc_id, doc_length in es_getitem(key,"document_length",self.instance,self.index,self.query)
^
Loading history...
Comprehensibility Best Practice introduced by
The variable doc_length does not seem to be defined.
Loading history...
87
        term_frequency = {int(term_id): global_frequency for term_id, global_frequency in es_getitem(key,"term_frequency",self.instance,self.index,self.query)}
0 ignored issues
show
Coding Style introduced by
Exactly one space required after comma
term_frequency = int(term_id): global_frequency for term_id, global_frequency in es_getitem(key,"term_frequency",self.instance,self.index,self.query)
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
term_frequency = int(term_id): global_frequency for term_id, global_frequency in es_getitem(key,"term_frequency",self.instance,self.index,self.query)
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
term_frequency = int(term_id): global_frequency for term_id, global_frequency in es_getitem(key,"term_frequency",self.instance,self.index,self.query)
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
term_frequency = int(term_id): global_frequency for term_id, global_frequency in es_getitem(key,"term_frequency",self.instance,self.index,self.query)
^
Loading history...
Comprehensibility Best Practice introduced by
The variable global_frequency does not seem to be defined.
Loading history...
88
        # TODO: this is the vectorized representation of each document
0 ignored issues
show
Coding Style introduced by
TODO and FIXME comments should generally be avoided.
Loading history...
89
        vectors = {int(doc_id): {int(term_id): term_weight for term_id, term_weight in doc_term_weights.items()} for doc_id, doc_term_weights in es_getitem(key,"vector",self.instance,self.index,self.query)}
0 ignored issues
show
Coding Style introduced by
Exactly one space required after comma
vectors = int(doc_id): {int(term_id): term_weight for term_id, term_weight in doc_term_weights.items() for doc_id, doc_term_weights in es_getitem(key,"vector",self.instance,self.index,self.query)}
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
vectors = int(doc_id): {int(term_id): term_weight for term_id, term_weight in doc_term_weights.items() for doc_id, doc_term_weights in es_getitem(key,"vector",self.instance,self.index,self.query)}
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
vectors = int(doc_id): {int(term_id): term_weight for term_id, term_weight in doc_term_weights.items() for doc_id, doc_term_weights in es_getitem(key,"vector",self.instance,self.index,self.query)}
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
vectors = int(doc_id): {int(term_id): term_weight for term_id, term_weight in doc_term_weights.items() for doc_id, doc_term_weights in es_getitem(key,"vector",self.instance,self.index,self.query)}
^
Loading history...
Comprehensibility Best Practice introduced by
The variable doc_term_weights does not seem to be defined.
Loading history...
Comprehensibility Best Practice introduced by
The variable term_weight does not seem to be defined.
Loading history...
90
        #vectors = {int(doc_id): {doc_term_weights for doc_id, doc_term_weights in es_getitem(key,"vector",self.instance,self.index,self.query)}
91
        #vectors = list(es_getitem(key,"vector",self.instance,self.index,self.query))
92
        #  {"doc1": {1: 3, 2: 1}  # word id is key, word count is value (for bag of words model)
93
        return VectorizerOutput(id_term_map=id_term_map,
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable id_term_map does not seem to be defined.
Loading history...
94
                                document_term_counts=document_term_count,
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable document_term_count does not seem to be defined.
Loading history...
95
                                doc_lengths=doc_lengths,
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable doc_lengths does not seem to be defined.
Loading history...
96
                                term_frequency=term_frequency,
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable term_frequency does not seem to be defined.
Loading history...
97
                                vectors=vectors)
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable vectors does not seem to be defined.
Loading history...
98
99 View Code Duplication
class ModeledElasticCorpora(BaseElasticCorpora):
0 ignored issues
show
Coding Style introduced by
This class should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
Comprehensibility Best Practice introduced by
The variable BaseElasticCorpora does not seem to be defined.
Loading history...
100
    def __setitem__(self, key, value):
101
        es_setitem(key,value.vocab.items(),"term",self.instance,self.index)
0 ignored issues
show
Coding Style introduced by
Exactly one space required after comma
es_setitem(key,value.vocab.items(),"term",self.instance,self.index)
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
es_setitem(key,value.vocab.items(),"term",self.instance,self.index)
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
es_setitem(key,value.vocab.items(),"term",self.instance,self.index)
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
es_setitem(key,value.vocab.items(),"term",self.instance,self.index)
^
Loading history...
Comprehensibility Best Practice introduced by
The variable self does not seem to be defined.
Loading history...
Comprehensibility Best Practice introduced by
The variable key does not seem to be defined.
Loading history...
102
        es_setitem(key,value.term_frequency.items(),"term_frequency",self.instance,self.index)
0 ignored issues
show
Coding Style introduced by
Exactly one space required after comma
es_setitem(key,value.term_frequency.items(),"term_frequency",self.instance,self.index)
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
es_setitem(key,value.term_frequency.items(),"term_frequency",self.instance,self.index)
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
es_setitem(key,value.term_frequency.items(),"term_frequency",self.instance,self.index)
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
es_setitem(key,value.term_frequency.items(),"term_frequency",self.instance,self.index)
^
Loading history...
103
        es_setitem(key,value.topic_term_matrix.items(),"topic_term_dist",self.instance,self.index)
0 ignored issues
show
Coding Style introduced by
Exactly one space required after comma
es_setitem(key,value.topic_term_matrix.items(),"topic_term_dist",self.instance,self.index)
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
es_setitem(key,value.topic_term_matrix.items(),"topic_term_dist",self.instance,self.index)
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
es_setitem(key,value.topic_term_matrix.items(),"topic_term_dist",self.instance,self.index)
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
es_setitem(key,value.topic_term_matrix.items(),"topic_term_dist",self.instance,self.index)
^
Loading history...
104
        es_setitem(key,value.doc_lengths.items(),"doc_length",self.instance,self.index)
0 ignored issues
show
Coding Style introduced by
Exactly one space required after comma
es_setitem(key,value.doc_lengths.items(),"doc_length",self.instance,self.index)
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
es_setitem(key,value.doc_lengths.items(),"doc_length",self.instance,self.index)
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
es_setitem(key,value.doc_lengths.items(),"doc_length",self.instance,self.index)
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
es_setitem(key,value.doc_lengths.items(),"doc_length",self.instance,self.index)
^
Loading history...
105
        es_setitem(key,value.doc_topic_matrix.items(),"doc_topic_dist",self.instance,self.index)
0 ignored issues
show
Coding Style introduced by
Exactly one space required after comma
es_setitem(key,value.doc_topic_matrix.items(),"doc_topic_dist",self.instance,self.index)
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
es_setitem(key,value.doc_topic_matrix.items(),"doc_topic_dist",self.instance,self.index)
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
es_setitem(key,value.doc_topic_matrix.items(),"doc_topic_dist",self.instance,self.index)
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
es_setitem(key,value.doc_topic_matrix.items(),"doc_topic_dist",self.instance,self.index)
^
Loading history...
106
107
    def __lt__(self, y):
0 ignored issues
show
Coding Style Naming introduced by
The name y does not conform to the argument naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
108
        return super(ModeledElasticCorpora, self).__lt__(y)
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable y does not seem to be defined.
Loading history...
109
110
    def __getitem__(self, key):
111
        vocab = {int(term_id): term for term_id, term in \
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable term does not seem to be defined.
Loading history...
Comprehensibility Best Practice introduced by
The variable term_id does not seem to be defined.
Loading history...
112
                 es_getitem(key,"term",self.instance,self.index,self.query)}
0 ignored issues
show
Coding Style introduced by
Exactly one space required after comma
es_getitem(key,"term",self.instance,self.index,self.query)}
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
es_getitem(key,"term",self.instance,self.index,self.query)}
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
es_getitem(key,"term",self.instance,self.index,self.query)}
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
es_getitem(key,"term",self.instance,self.index,self.query)}
^
Loading history...
Comprehensibility Best Practice introduced by
The variable self does not seem to be defined.
Loading history...
Comprehensibility Best Practice introduced by
The variable key does not seem to be defined.
Loading history...
113
        term_frequency = {int(term_id): tf for term_id, tf in \
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable tf does not seem to be defined.
Loading history...
114
                          es_getitem(key,"term_frequency",self.instance,self.index,self.query)}
0 ignored issues
show
Coding Style introduced by
Exactly one space required after comma
es_getitem(key,"term_frequency",self.instance,self.index,self.query)}
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
es_getitem(key,"term_frequency",self.instance,self.index,self.query)}
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
es_getitem(key,"term_frequency",self.instance,self.index,self.query)}
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
es_getitem(key,"term_frequency",self.instance,self.index,self.query)}
^
Loading history...
115
        topic_term_matrix = {topic_id: topic_term_dist for topic_id, topic_term_dist in \
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable topic_term_dist does not seem to be defined.
Loading history...
Comprehensibility Best Practice introduced by
The variable topic_id does not seem to be defined.
Loading history...
116
                             es_getitem(key,"topic_term_dist",self.instance,self.index,self.query)}
0 ignored issues
show
Coding Style introduced by
Exactly one space required after comma
es_getitem(key,"topic_term_dist",self.instance,self.index,self.query)}
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
es_getitem(key,"topic_term_dist",self.instance,self.index,self.query)}
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
es_getitem(key,"topic_term_dist",self.instance,self.index,self.query)}
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
es_getitem(key,"topic_term_dist",self.instance,self.index,self.query)}
^
Loading history...
117
        doc_lengths = {topic_id: doc_length for topic_id, doc_length in \
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable doc_length does not seem to be defined.
Loading history...
118
                       es_getitem(key,"doc_length",self.instance,self.index,self.query)}
0 ignored issues
show
Coding Style introduced by
Exactly one space required after comma
es_getitem(key,"doc_length",self.instance,self.index,self.query)}
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
es_getitem(key,"doc_length",self.instance,self.index,self.query)}
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
es_getitem(key,"doc_length",self.instance,self.index,self.query)}
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
es_getitem(key,"doc_length",self.instance,self.index,self.query)}
^
Loading history...
119
        doc_topic_matrix = {int(doc_id): doc_topic_dist for doc_id, doc_topic_dist in \
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable doc_id does not seem to be defined.
Loading history...
Comprehensibility Best Practice introduced by
The variable doc_topic_dist does not seem to be defined.
Loading history...
120
                             es_getitem(key,"doc_topic_dist",self.instance,self.index,self.query)}
0 ignored issues
show
Coding Style introduced by
Exactly one space required after comma
es_getitem(key,"doc_topic_dist",self.instance,self.index,self.query)}
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
es_getitem(key,"doc_topic_dist",self.instance,self.index,self.query)}
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
es_getitem(key,"doc_topic_dist",self.instance,self.index,self.query)}
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
es_getitem(key,"doc_topic_dist",self.instance,self.index,self.query)}
^
Loading history...
121
        return ModelOutput(vocab=vocab, term_frequency=term_frequency,
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable term_frequency does not seem to be defined.
Loading history...
Comprehensibility Best Practice introduced by
The variable vocab does not seem to be defined.
Loading history...
122
                           topic_term_matrix=topic_term_matrix,
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable topic_term_matrix does not seem to be defined.
Loading history...
123
                           doc_lengths=doc_lengths,
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable doc_lengths does not seem to be defined.
Loading history...
124
                           doc_topic_matrix=doc_topic_matrix)
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable doc_topic_matrix does not seem to be defined.
Loading history...
125
126
@register_output
0 ignored issues
show
Coding Style introduced by
This class should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
Unused Code introduced by
This interface does not seem to be used anywhere.
Loading history...
Comprehensibility Best Practice introduced by
The variable register_output does not seem to be defined.
Loading history...
127
class ElasticSearchOutput(OutputInterface):
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable OutputInterface does not seem to be defined.
Loading history...
128
    def __init__(self, source, index, hash_field=None, doc_type='continuum',
129
                 query=None, iterable=None, filter_expression="",
130
                 vectorized_corpora=None, tokenized_corpora=None, modeled_corpora=None,
131
                 **kwargs):
132
        super(ElasticSearchOutput, self).__init__()
133
        self.hosts = source
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable source does not seem to be defined.
Loading history...
134
        self.instance = Elasticsearch(hosts=source, **kwargs)
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable kwargs does not seem to be defined.
Loading history...
135
        self.index = index
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable index does not seem to be defined.
Loading history...
136
        self.doc_type = doc_type
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable doc_type does not seem to be defined.
Loading history...
137
        self.query = query
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable query does not seem to be defined.
Loading history...
138
        self.hash_field = hash_field
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable hash_field does not seem to be defined.
Loading history...
139
        if iterable:
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable iterable does not seem to be defined.
Loading history...
140
            self.import_from_iterable(iterable, hash_field)
141
        self.filter_expression = filter_expression
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable filter_expression does not seem to be defined.
Loading history...
142
143
        self.tokenized_corpora = tokenized_corpora if tokenized_corpora else \
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable tokenized_corpora does not seem to be defined.
Loading history...
144
            BaseElasticCorpora(self.instance, self.index, 'tokenized', self.query)
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable self does not seem to be defined.
Loading history...
145
        self.vectorized_corpora = vectorized_corpora if vectorized_corpora else \
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable vectorized_corpora does not seem to be defined.
Loading history...
146
            VectorizedElasticCorpora(self.instance, self.index, 'vectorized', self.query)
147
        self.modeled_corpora = modeled_corpora if modeled_corpora else \
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable modeled_corpora does not seem to be defined.
Loading history...
148
            ModeledElasticCorpora(self.instance, self.index, "models", self.query)
149
150
151
    @property
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable property does not seem to be defined.
Loading history...
152
    def filter_string(self):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
153
        return self.filter_expression
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable self does not seem to be defined.
Loading history...
154
155
    def import_from_iterable(self, iterable, field_to_hash='text', batch_size=500):
156
        """Load data into Elasticsearch from iterable.
157
158
        iterable: generally a list of dicts, but possibly a list of strings
159
            This is your data.  Your dictionary structure defines the schema
160
            of the elasticsearch index.
161
        field_to_hash: string identifier of field to hash for content ID.  For
162
            list of dicts, a valid key value in the dictionary is required. For
163
            list of strings, a dictionary with one key, "text" is created and
164
            used.
165
        """
166
        if field_to_hash:
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable field_to_hash does not seem to be defined.
Loading history...
167
            self.hash_field = field_to_hash
168
            batch = []
169
            for item in iterable:
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable iterable does not seem to be defined.
Loading history...
170
                if isinstance(item, basestring):
0 ignored issues
show
Comprehensibility Best Practice introduced by
Undefined variable 'basestring'
Loading history...
Comprehensibility Best Practice introduced by
The variable item does not seem to be defined.
Loading history...
Comprehensibility Best Practice introduced by
The variable basestring does not seem to be defined.
Loading history...
171
                    item = {field_to_hash: item}
172
                id = hash(item[field_to_hash])
0 ignored issues
show
Bug Best Practice introduced by
This seems to re-define the built-in id.

It is generally discouraged to redefine built-ins as this makes code very hard to read.

Loading history...
Coding Style Naming introduced by
The name id does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
173
                action = {'_op_type': 'update',
174
                          '_index': self.index,
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable self does not seem to be defined.
Loading history...
175
                          '_type': self.doc_type,
176
                          '_id': id,
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable id does not seem to be defined.
Loading history...
177
                          'doc': item,
178
                          'doc_as_upsert': "true",
179
                          }
180
                batch.append(action)
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable action does not seem to be defined.
Loading history...
181
                if len(batch) >= batch_size:
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable batch does not seem to be defined.
Loading history...
182
                    helpers.bulk(client=self.instance, actions=batch, index=self.index)
183
                    batch = []
184
            if batch:
185
                helpers.bulk(client=self.instance, actions=batch, index=self.index)
186
            self.instance.indices.refresh(self.index)
187
        else:
188
            raise ValueError("A field_to_hash is required for import_from_iterable")
189
190
    def convert_date_field_and_reindex(self, field):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
191
        index = self.index
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable self does not seem to be defined.
Loading history...
192
        if self.instance.indices.get_field_mapping(fields=[field],
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable field does not seem to be defined.
Loading history...
193
                                           index=index,
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable index does not seem to be defined.
Loading history...
194
                                           doc_type=self.doc_type) != 'date':
195
            index = self.index+"_{}_alias_date".format(field)
196
            if not self.instance.indices.exists(index) or self.instance.indices.get_field_mapping(field=field,
197
                                           index=index,
198
                                           doc_type=self.doc_type) != 'date':
199
                mapping = self.instance.indices.get_mapping(index=self.index,
200
                                                            doc_type=self.doc_type)
201
                mapping[self.index]["mappings"][self.doc_type]["properties"][field] = {"type": "date"}
202
                self.instance.indices.put_alias(index=self.index,
203
                                                name=index,
204
                                                body=mapping)
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable mapping does not seem to be defined.
Loading history...
205
                self.instance.indices.refresh(index)
206
                while self.instance.count(index=self.index) != self.instance.count(index=index):
207
                    logging.info("Waiting for date indexed data to be indexed...")
208
                    time.sleep(1)
209
        return index
210
211
    # TODO: validate input data to ensure that it has valid year data
0 ignored issues
show
Coding Style introduced by
TODO and FIXME comments should generally be avoided.
Loading history...
212
    def get_date_filtered_data(self, field_to_get, start, end, filter_field="date"):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
213
        converted_index = self.convert_date_field_and_reindex(field=filter_field)
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable filter_field does not seem to be defined.
Loading history...
214
215
        results = helpers.scan(self.instance, index=converted_index,
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable converted_index does not seem to be defined.
Loading history...
Comprehensibility Best Practice introduced by
The variable self does not seem to be defined.
Loading history...
216
                               doc_type=self.doc_type, query={
217
            "query": {"filtered": {"filter": {"range": {filter_field: {
218
                "gte": start,"lte": end}}}}}})
0 ignored issues
show
Coding Style introduced by
Exactly one space required after comma
"gte": start,"lte": end}}}}}})
^
Loading history...
Comprehensibility Best Practice introduced by
The variable start does not seem to be defined.
Loading history...
Comprehensibility Best Practice introduced by
The variable end does not seem to be defined.
Loading history...
219
        for result in results:
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable results does not seem to be defined.
Loading history...
220
            yield result["_id"], result['_source'][field_to_get]
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable field_to_get does not seem to be defined.
Loading history...
Comprehensibility Best Practice introduced by
The variable result does not seem to be defined.
Loading history...
221
222
    def get_filtered_data(self, field_to_get, filter=""):
0 ignored issues
show
Bug Best Practice introduced by
This seems to re-define the built-in filter.

It is generally discouraged to redefine built-ins as this makes code very hard to read.

Loading history...
223
        results = helpers.scan(self.instance, index=self.index,
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable self does not seem to be defined.
Loading history...
224
                               query=self.query, doc_type=self.doc_type)
225
        for result in results:
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable results does not seem to be defined.
Loading history...
226
            yield result["_id"], result['_source'][field_to_get]
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable result does not seem to be defined.
Loading history...
Comprehensibility Best Practice introduced by
The variable field_to_get does not seem to be defined.
Loading history...
227
228
    def save(self, filename, saved_data=None):
229
        if saved_data is None:
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable saved_data does not seem to be defined.
Loading history...
230
            saved_data = {"source": self.hosts, "index": self.index, "hash_field": self.hash_field,
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable self does not seem to be defined.
Loading history...
231
                          "doc_type": self.doc_type, "query": self.query}
232
        return super(ElasticSearchOutput, self).save(filename, saved_data)
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable filename does not seem to be defined.
Loading history...
233
234
    def synchronize(self, max_wait, field):
235
        # TODO: change this to a more general condition for wider use, including read_input
0 ignored issues
show
Coding Style introduced by
TODO and FIXME comments should generally be avoided.
Loading history...
236
        # could just pass in a string condition and then 'while not eval(condition)'
237
        count_not_yet_updated = -1
238
        while count_not_yet_updated != 0:
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable count_not_yet_updated does not seem to be defined.
Loading history...
239
            count_not_yet_updated = self.instance.count(index=self.index,
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable self does not seem to be defined.
Loading history...
240
                                             doc_type=self.doc_type,
241
                                             body={"query": {
242
                                                        "constant_score" : {
243
                                                            "filter" : {
244
                                                                "missing" : {
245
                                                                    "field" : field}}}}})['count']
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable field does not seem to be defined.
Loading history...
246
            logging.debug("Count not yet updated: {}".format(count_not_yet_updated))
247
            time.sleep(0.01)
248
        pass
0 ignored issues
show
Unused Code introduced by
Unnecessary pass statement
Loading history...
249
250