Completed
Pull Request — master (#78)
by
unknown
01:27 queued 19s
created

GreedyDict   A

Complexity

Total Complexity 5

Size/Duplication

Total Lines 9
Duplicated Lines 0 %
Metric Value
wmc 5
dl 0
loc 9
rs 10

2 Methods

Rating   Name   Duplication   Size   Complexity  
A __setitem__() 0 4 3
A __iter__() 0 3 2
1
from six.moves import UserDict
0 ignored issues
show
Coding Style introduced by
This module should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
Configuration introduced by
The import six.moves could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
2
import types
3
4
from ._registry import register_output
5
from .base_output import OutputInterface
6
7
8
class GreedyDict(UserDict, object):
0 ignored issues
show
Coding Style introduced by
This class should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
9
    def __setitem__(self, key, value):
10
        if isinstance(value, types.GeneratorType):
11
            value = [val for val in value]
12
        super(GreedyDict, self).__setitem__(key, value)
13
14
    def __iter__(self):
15
        for val in self.data.values():
0 ignored issues
show
Bug introduced by
The Instance of GreedyDict does not seem to have a member named data.

This check looks for calls to members that are non-existent. These calls will fail.

The member could have been renamed or removed.

Loading history...
16
            yield val
17
18
19
@register_output
0 ignored issues
show
Coding Style introduced by
This class should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
Unused Code introduced by
This interface does not seem to be used anywhere.
Loading history...
20
class InMemoryOutput(OutputInterface):
21
    def __init__(self, iterable=None, hash_field=None,
22
                 tokenized_corpora=None,
23
                 vectorized_corpora=None, modeled_corpora=None):
24
        super(InMemoryOutput, self).__init__()
25
26
        self.corpus = GreedyDict()
27
28
        if iterable:
29
            self.import_from_iterable(iterable, hash_field)
30
31
        self.tokenized_corpora = tokenized_corpora if tokenized_corpora else GreedyDict()
32
        self.vectorized_corpora = vectorized_corpora if vectorized_corpora else GreedyDict()
33
        self.modeled_corpora = modeled_corpora if modeled_corpora else GreedyDict()
34
35
    def import_from_iterable(self, iterable, field_to_hash):
36
        """
37
        iterable: generally a list of dicts, but possibly a list of strings
38
            This is your data.  Your dictionary structure defines the schema
39
            of the elasticsearch index.
40
        """
41
        self.hash_field=field_to_hash
0 ignored issues
show
Coding Style introduced by
Exactly one space required around assignment
self.hash_field=field_to_hash
^
Loading history...
42
        for item in iterable:
43
            if isinstance(item, basestring):
0 ignored issues
show
Comprehensibility Best Practice introduced by
Undefined variable 'basestring'
Loading history...
44
                item = {field_to_hash: item}
45
            elif field_to_hash not in item and field_to_hash in item.values()[0]:
46
                item = item.values()[0]
47
            id = hash(item[field_to_hash])
0 ignored issues
show
Bug Best Practice introduced by
This seems to re-define the built-in id.

It is generally discouraged to redefine built-ins as this makes code very hard to read.

Loading history...
Coding Style Naming introduced by
The name id does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
48
            self.corpus[id] = item
49
50
    # TODO: generalize for datetimes
0 ignored issues
show
Coding Style introduced by
TODO and FIXME comments should generally be avoided.
Loading history...
51
    # TODO: validate input data to ensure that it has valid year data
0 ignored issues
show
Coding Style introduced by
TODO and FIXME comments should generally be avoided.
Loading history...
52
    def get_date_filtered_data(self, field_to_get, start, end, filter_field="year"):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
53
        return self.get_filtered_data(field_to_get,
54
                                      "{}<=int({}['{}'])<={}".format(start, "{}",
55
                                                                     filter_field, end))
56
57
    def get_filtered_data(self, field_to_get, filter=""):
0 ignored issues
show
Bug Best Practice introduced by
This seems to re-define the built-in filter.

It is generally discouraged to redefine built-ins as this makes code very hard to read.

Loading history...
58
        if not filter:
59
            for doc_id, doc in self.corpus.items():
0 ignored issues
show
Bug introduced by
The Instance of GreedyDict does not seem to have a member named items.

This check looks for calls to members that are non-existent. These calls will fail.

The member could have been renamed or removed.

Loading history...
60
                yield doc_id, doc[field_to_get]
61
        else:
62
            for doc_id, doc in self.corpus.items():
0 ignored issues
show
Bug introduced by
The Instance of GreedyDict does not seem to have a member named items.

This check looks for calls to members that are non-existent. These calls will fail.

The member could have been renamed or removed.

Loading history...
63
                if eval(filter.format(doc)):
0 ignored issues
show
Security Best Practice introduced by
eval should generally only be used if absolutely necessary.

The usage of eval might allow for executing arbitrary code if a user manages to inject dynamic input. Please use this language feature with care and only when you are sure of the input.

Loading history...
64
                    yield doc_id, doc[field_to_get]
65
66
    def save(self, filename):
0 ignored issues
show
Bug introduced by
Arguments number differs from overridden 'save' method
Loading history...
67
        saved_data = {"iterable": self.corpus,
68
                      "hash_field": self.hash_field,
69
                      "modeled_corpora": self.modeled_corpora,
70
                      "vectorized_corpora": self.vectorized_corpora,
71
                      "tokenized_corpora": self.tokenized_corpora}
72
        return super(InMemoryOutput, self).save(filename, saved_data)
73