Completed
Push — master ( 980041...30b693 )
by
unknown
10s
created

topik.fileio.tests.ProjectTest   A

Complexity

Total Complexity 18

Size/Duplication

Total Lines 67
Duplicated Lines 0 %
Metric Value
dl 0
loc 67
rs 10
wmc 18

8 Methods

Rating   Name   Duplication   Size   Complexity  
A ProjectTest.test_read_input() 0 2 1
A ProjectTest.test_model() 0 8 3
A ProjectTest.test_vectorize() 0 5 1
A ProjectTest.test_get_filtered_corpus_iterator() 0 4 1
A ProjectTest.test_get_date_filtered_corpus_iterator() 0 4 1
C ProjectTest.test_context_manager() 0 23 7
A ProjectTest.test_visualize() 0 5 1
A ProjectTest.test_tokenize() 0 8 3
1
import glob
0 ignored issues
show
Coding Style introduced by
This module should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
2
import os
3
import time
4
import unittest
5
6
import elasticsearch
7
import nose.tools as nt
0 ignored issues
show
Configuration introduced by
The import nose.tools could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
8
9
from topik.fileio import TopikProject
10
from topik.fileio.tests import test_data_path
11
12
# make logging quiet during testing, to keep Travis CI logs short.
13
import logging
14
logging.basicConfig()
15
logging.getLogger('elasticsearch').setLevel(logging.ERROR)
16
logging.getLogger('urllib3').setLevel(logging.ERROR)
17
18
SAVE_FILENAME = "test_project"
19
20
sample_tokenized_doc = (2318580746137828354,
0 ignored issues
show
Coding Style Naming introduced by
The name sample_tokenized_doc does not conform to the constant naming conventions ((([A-Z_][A-Z0-9_]*)|(__.*__))$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
21
 [u'nano', u'sized', u'tio', u'particles', u'applications', u'including',
22
  u'use', u'photocatalysts', u'heat', u'transfer', u'fluids', u'nanofluids',
23
  u'present', u'study', u'tio', u'nanoparticles', u'controllable', u'phase',
24
  u'particle', u'size', u'obtained', u'homogeneous', u'gas', u'phase',
25
  u'nucleation', u'chemical', u'vapor', u'condensation', u'cvc', u'phase',
26
  u'particle', u'size', u'tio', u'nanoparticles', u'processing', u'conditions',
27
  u'characterized', u'x', u'ray', u'diffraction', u'transmission', u'electron',
28
  u'microscopy', u'chamber', u'temperature', u'pressure', u'key', u'parameters',
29
  u'affecting', u'particle', u'phase', u'size', u'pure', u'anatase', u'phase',
30
  u'observed', u'synthesis', u'temperatures', u'low', u'c', u'chamber',
31
  u'pressure', u'varying', u'torr', u'furnace', u'temperature', u'increased',
32
  u'c', u'pressure', u'torr', u'mixture', u'anatase', u'rutile', u'phases',
33
  u'observed', u'predominant', u'phase', u'anatase', u'average', u'particle',
34
  u'size', u'experimental', u'conditions', u'observed', u'nm'])
35
36
test_data_path = os.path.join(test_data_path, "test_data_json_stream.json")
0 ignored issues
show
Coding Style Naming introduced by
The name test_data_path does not conform to the constant naming conventions ((([A-Z_][A-Z0-9_]*)|(__.*__))$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
37
38
class ProjectTest(object):
0 ignored issues
show
Coding Style introduced by
This class should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
39
    def test_context_manager(self):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
40
        for filename in glob.glob("context_output*"):
41
            os.remove(filename)
42
        with TopikProject("context_output", self.output_type, self.output_args) as project:
0 ignored issues
show
Bug introduced by
The Instance of ProjectTest does not seem to have a member named output_type.

This check looks for calls to members that are non-existent. These calls will fail.

The member could have been renamed or removed.

Loading history...
Bug introduced by
The Instance of ProjectTest does not seem to have a member named output_args.

This check looks for calls to members that are non-existent. These calls will fail.

The member could have been renamed or removed.

Loading history...
43
            project.read_input(source=test_data_path, content_field='abstract')
44
            project.tokenize()
45
            project.vectorize(method='bag_of_words')
46
            project.run_model(model_name='lda', ntopics=2)
47
48
        # above runs through a whole workflow (minus plotting.)  At end, it closes file.
49
        # load output here.
50
        with TopikProject("context_output") as project:
51
            nt.assert_equal(len(list(project.get_filtered_corpus_iterator())), 100)
52
            nt.assert_true(sample_tokenized_doc in list(iter(project.selected_tokenized_corpus)))
53
            nt.assert_equal(project.selected_vectorized_corpus.global_term_count, 2434)
54
            nt.assert_equal(len(project.selected_vectorized_corpus), 100)  # All documents processed
55
            for doc in project.selected_modeled_corpus.doc_topic_matrix.values():
56
                nt.assert_almost_equal(sum(doc), 1)
57
            for topic in project.selected_modeled_corpus.topic_term_matrix.values():
58
                nt.assert_almost_equal(sum(topic), 1)
59
60
        for filename in glob.glob("context_output*"):
61
            os.remove(filename)
62
63
    def test_read_input(self):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
64
        nt.assert_equal(len(list(self.project.get_filtered_corpus_iterator())), 100)
0 ignored issues
show
Bug introduced by
The Instance of ProjectTest does not seem to have a member named project.

This check looks for calls to members that are non-existent. These calls will fail.

The member could have been renamed or removed.

Loading history...
65
66
    def test_get_filtered_corpus_iterator(self):
0 ignored issues
show
Coding Style Naming introduced by
The name test_get_filtered_corpus_iterator does not conform to the method naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
67
        doc_list = list(self.project.get_filtered_corpus_iterator())
0 ignored issues
show
Bug introduced by
The Instance of ProjectTest does not seem to have a member named project.

This check looks for calls to members that are non-existent. These calls will fail.

The member could have been renamed or removed.

Loading history...
68
        nt.assert_equal(type(doc_list[0]), type(('123', 'text')))
69
        nt.assert_equal(len(doc_list), 100)
70
71
    def test_get_date_filtered_corpus_iterator(self):
0 ignored issues
show
Coding Style Naming introduced by
The name test_get_date_filtered_corpus_iterator does not conform to the method naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
72
        results = list(self.project.get_date_filtered_corpus_iterator(
0 ignored issues
show
Bug introduced by
The Instance of ProjectTest does not seem to have a member named project.

This check looks for calls to members that are non-existent. These calls will fail.

The member could have been renamed or removed.

Loading history...
73
            field_to_get="abstract", start=1975, end=1999, filter_field='year'))
74
        nt.assert_equal(len(results), 25)
75
76
    def test_tokenize(self):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
77
        self.project.tokenize('simple')
0 ignored issues
show
Bug introduced by
The Instance of ProjectTest does not seem to have a member named project.

This check looks for calls to members that are non-existent. These calls will fail.

The member could have been renamed or removed.

Loading history...
78
        in_results = False
79
        for id, doc in self.project.selected_tokenized_corpus:
0 ignored issues
show
Bug Best Practice introduced by
This seems to re-define the built-in id.

It is generally discouraged to redefine built-ins as this makes code very hard to read.

Loading history...
Coding Style Naming introduced by
The name id does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
Bug introduced by
The Instance of ProjectTest does not seem to have a member named project.

This check looks for calls to members that are non-existent. These calls will fail.

The member could have been renamed or removed.

Loading history...
Unused Code introduced by
The variable id seems to be unused.
Loading history...
80
            if doc in sample_tokenized_doc:
81
                in_results = True
82
                break
83
        nt.assert_true(in_results)
84
85
    def test_vectorize(self):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
86
        self.project.tokenize()
0 ignored issues
show
Bug introduced by
The Instance of ProjectTest does not seem to have a member named project.

This check looks for calls to members that are non-existent. These calls will fail.

The member could have been renamed or removed.

Loading history...
87
        self.project.vectorize()
0 ignored issues
show
Bug introduced by
The Instance of ProjectTest does not seem to have a member named project.

This check looks for calls to members that are non-existent. These calls will fail.

The member could have been renamed or removed.

Loading history...
88
        nt.assert_equal(self.project.selected_vectorized_corpus.global_term_count, 2434)
0 ignored issues
show
Bug introduced by
The Instance of ProjectTest does not seem to have a member named project.

This check looks for calls to members that are non-existent. These calls will fail.

The member could have been renamed or removed.

Loading history...
89
        nt.assert_equal(len(self.project.selected_vectorized_corpus), 100)  # All documents processed
0 ignored issues
show
Bug introduced by
The Instance of ProjectTest does not seem to have a member named project.

This check looks for calls to members that are non-existent. These calls will fail.

The member could have been renamed or removed.

Loading history...
90
91
    def test_model(self):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
92
        self.project.tokenize()
0 ignored issues
show
Bug introduced by
The Instance of ProjectTest does not seem to have a member named project.

This check looks for calls to members that are non-existent. These calls will fail.

The member could have been renamed or removed.

Loading history...
93
        self.project.vectorize()
0 ignored issues
show
Bug introduced by
The Instance of ProjectTest does not seem to have a member named project.

This check looks for calls to members that are non-existent. These calls will fail.

The member could have been renamed or removed.

Loading history...
94
        self.project.run_model(model_name='lda', ntopics=2)
0 ignored issues
show
Bug introduced by
The Instance of ProjectTest does not seem to have a member named project.

This check looks for calls to members that are non-existent. These calls will fail.

The member could have been renamed or removed.

Loading history...
95
        for doc in self.project.selected_modeled_corpus.doc_topic_matrix.values():
0 ignored issues
show
Bug introduced by
The Instance of ProjectTest does not seem to have a member named project.

This check looks for calls to members that are non-existent. These calls will fail.

The member could have been renamed or removed.

Loading history...
96
            nt.assert_almost_equal(sum(doc), 1)
97
        for topic in self.project.selected_modeled_corpus.topic_term_matrix.values():
0 ignored issues
show
Bug introduced by
The Instance of ProjectTest does not seem to have a member named project.

This check looks for calls to members that are non-existent. These calls will fail.

The member could have been renamed or removed.

Loading history...
98
            nt.assert_almost_equal(sum(topic), 1)
99
100
    def test_visualize(self):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
101
        self.project.tokenize()
0 ignored issues
show
Bug introduced by
The Instance of ProjectTest does not seem to have a member named project.

This check looks for calls to members that are non-existent. These calls will fail.

The member could have been renamed or removed.

Loading history...
102
        self.project.vectorize(method='bag_of_words')
0 ignored issues
show
Bug introduced by
The Instance of ProjectTest does not seem to have a member named project.

This check looks for calls to members that are non-existent. These calls will fail.

The member could have been renamed or removed.

Loading history...
103
        self.project.run_model(ntopics=2)
0 ignored issues
show
Bug introduced by
The Instance of ProjectTest does not seem to have a member named project.

This check looks for calls to members that are non-existent. These calls will fail.

The member could have been renamed or removed.

Loading history...
104
        self.project.visualize(vis_name='termite', topn=5)
0 ignored issues
show
Bug introduced by
The Instance of ProjectTest does not seem to have a member named project.

This check looks for calls to members that are non-existent. These calls will fail.

The member could have been renamed or removed.

Loading history...
105
106
107
class TestInMemoryOutput(unittest.TestCase, ProjectTest):
0 ignored issues
show
Coding Style introduced by
This class should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
108
    def setUp(self):
109
        self.output_type = "InMemoryOutput"
110
        self.output_args = {}
111
        self.project = TopikProject("test_project",
112
                                    output_type=self.output_type,
113
                                    output_args=self.output_args)
114
        self.project.read_input(test_data_path, content_field="abstract")
115
116
class TestElasticSearchOutput(unittest.TestCase, ProjectTest):
0 ignored issues
show
Coding Style introduced by
This class should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
117
    INDEX = "test_index"
118
    def setUp(self):
119
        self.output_type = "ElasticSearchOutput"
120
        self.output_args = {'source': 'localhost',
121
                            'index': TestElasticSearchOutput.INDEX,
122
                            'content_field': "abstract"}
123
        self.project = TopikProject("test_project", output_type=self.output_type,
124
                                    output_args=self.output_args)
125
        self.project.read_input(test_data_path, content_field="abstract",
126
                                synchronous_wait=30)
127
128
    def tearDown(self):
129
        instance = elasticsearch.Elasticsearch("localhost")
130
        instance.indices.delete(TestElasticSearchOutput.INDEX)
131
        if instance.indices.exists("{}_year_alias_date".format(TestElasticSearchOutput.INDEX)):
132
            instance.indices.delete("{}_year_alias_date".format(TestElasticSearchOutput.INDEX))
133
        time.sleep(1)
134