TokenizerRegistry   A
last analyzed

Complexity

Total Complexity 1

Size/Duplication

Total Lines 8
Duplicated Lines 0 %

Importance

Changes 2
Bugs 1 Features 0
Metric Value
wmc 1
c 2
b 1
f 0
dl 0
loc 8
rs 10

1 Method

Rating   Name   Duplication   Size   Complexity  
A __init__() 0 3 1
1
from six.moves import UserDict
0 ignored issues
show
Coding Style introduced by
This module should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
Configuration introduced by
The import six.moves could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
2
from functools import partial
3
4
from topik.singleton_registry import _base_register_decorator
5
6
7
# This subclass serves to establish a new singleon instance of functions
8
#    for this particular step in topic modeling.  No implementation necessary.
9
class TokenizerRegistry(UserDict, object):
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable object does not seem to be defined.
Loading history...
Comprehensibility Best Practice introduced by
The variable UserDict does not seem to be defined.
Loading history...
10
    """Uses Borg design pattern.  Core idea is that there is a global registry for each step's
11
    possible methods
12
    """
13
    __shared_state = {}
14
    def __init__(self, *args, **kwargs):
15
        self.__dict__ = self.__shared_state
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable self does not seem to be defined.
Loading history...
16
        super(TokenizerRegistry, self).__init__(*args, **kwargs)
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable args does not seem to be defined.
Loading history...
Comprehensibility Best Practice introduced by
The variable kwargs does not seem to be defined.
Loading history...
17
18
19
# a nicer, more pythonic handle to our singleton instance
20
registered_tokenizers = TokenizerRegistry()
0 ignored issues
show
Coding Style Naming introduced by
The name registered_tokenizers does not conform to the constant naming conventions ((([A-Z_][A-Z0-9_]*)|(__.*__))$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
21
22
# fill in the registration function
23
register = partial(_base_register_decorator, registered_tokenizers)
0 ignored issues
show
Coding Style Naming introduced by
The name register does not conform to the constant naming conventions ((([A-Z_][A-Z0-9_]*)|(__.*__))$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
Comprehensibility Best Practice introduced by
The variable _base_register_decorator does not seem to be defined.
Loading history...
24
25
26
def tokenize(corpus, method="simple", **kwargs):
27
    """Break documents up into component words, optionally eliminating stopwords.
28
29
    Output from this function is used as input to vectorization steps.
30
31
    raw_data: iterable corpus object containing the text to be processed.
32
        Each iteration call should return a new document's content.
33
    method: string id of tokenizer to use.  For keys, see
34
        topik.tokenizers.registered_tokenizers (which is a dictionary of functions)
35
    kwargs: arbitrary dicionary of extra parameters.
36
    """
37
    return registered_tokenizers[method](corpus, **kwargs)
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable kwargs does not seem to be defined.
Loading history...
38