Completed
Push — master ( 980041...30b693 )
by
unknown
10s
created

TokenizerRegistry.__init__()   A

Complexity

Conditions 1

Size

Total Lines 3

Duplication

Lines 0
Ratio 0 %
Metric Value
dl 0
loc 3
rs 10
cc 1
1
from six.moves import UserDict
0 ignored issues
show
Coding Style introduced by
This module should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
Configuration introduced by
The import six.moves could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
2
from functools import partial
3
4
from topik.singleton_registry import _base_register_decorator
5
6
7
# This subclass serves to establish a new singleon instance of functions
8
#    for this particular step in topic modeling.  No implementation necessary.
9
class TokenizerRegistry(UserDict, object):
10
    """Uses Borg design pattern.  Core idea is that there is a global registry for each step's
11
    possible methods
12
    """
13
    __shared_state = {}
14
    def __init__(self, *args, **kwargs):
15
        self.__dict__ = self.__shared_state
16
        super(TokenizerRegistry, self).__init__(*args, **kwargs)
17
18
19
# a nicer, more pythonic handle to our singleton instance
20
registered_tokenizers = TokenizerRegistry()
0 ignored issues
show
Coding Style Naming introduced by
The name registered_tokenizers does not conform to the constant naming conventions ((([A-Z_][A-Z0-9_]*)|(__.*__))$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
21
22
# fill in the registration function
23
register = partial(_base_register_decorator, registered_tokenizers)
0 ignored issues
show
Coding Style Naming introduced by
The name register does not conform to the constant naming conventions ((([A-Z_][A-Z0-9_]*)|(__.*__))$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
24
25
26
def tokenize(corpus, method="simple", **kwargs):
27
    """Break documents up into component words, optionally eliminating stopwords.
28
29
    Output from this function is used as input to vectorization steps.
30
31
    raw_data: iterable corpus object containing the text to be processed.
32
        Each iteration call should return a new document's content.
33
    method: string id of tokenizer to use.  For keys, see
34
        topik.tokenizers.registered_tokenizers (which is a dictionary of functions)
35
    kwargs: arbitrary dicionary of extra parameters.
36
    """
37
    return registered_tokenizers[method](corpus, **kwargs)
38