Passed
Push — main ( 081b85...deff66 )
by Mohammad
02:40 queued 12s
created

tracking_policy_agendas.word2vec.w2v_emb   A

Complexity

Total Complexity 13

Size/Duplication

Total Lines 53
Duplicated Lines 79.25 %

Test Coverage

Coverage 80%

Importance

Changes 0
Metric Value
wmc 13
eloc 44
dl 42
loc 53
ccs 32
cts 40
cp 0.8
rs 10
c 0
b 0
f 0

8 Methods

Rating   Name   Duplication   Size   Complexity  
A W2VEmb.__init() 5 5 1
A W2VEmb.__init__() 5 5 2
A W2VEmb.tf_idf_transformer() 6 6 1
A W2VEmb.__getitem__() 3 3 2
A W2VEmb.tf_idf_mean() 4 4 2
A W2VEmb.load() 3 3 2
A W2VEmb.save() 3 3 2
A W2VEmb.encode() 5 5 1

How to fix   Duplicated Code   

Duplicated Code

Duplicate code is one of the most pungent code smells. A rule that is often used is to re-structure code once it is duplicated in three or more places.

Common duplication problems, and corresponding solutions are:

1 1
import pickle
0 ignored issues
show
introduced by
Missing module docstring
Loading history...
2 1
import gensim
3 1
import numpy as np
4 1
import pandas as pd
5 1
from gensim import utils
6 1
from sklearn.pipeline import Pipeline
7 1
from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer
8 1
from .w2v_corpus import W2VCorpus
9
10
11 1 View Code Duplication
class W2VEmb:
0 ignored issues
show
introduced by
Missing class docstring
Loading history...
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
12 1
    def __init__(self, text_document=None):
13 1
        self.wv2_corpus = None
14 1
        self.w2v_model = None
15 1
        self.tf_idf_transformation = None
16 1
        if text_document is not None: self.__init(text_document)
0 ignored issues
show
Coding Style introduced by
More than one statement on a single line
Loading history...
17
18 1
    def __init(self, text_document: pd.Series):
19
        text_document = text_document.fillna('')
20
        self.tf_idf_transformation = self.tf_idf_transformer(text_document)
21
        self.wv2_corpus = W2VCorpus(text_document)
22
        self.w2v_model = gensim.models.Word2Vec(sentences=self.wv2_corpus, min_count=1, vector_size=900, epochs=50)
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (115/100).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
23
24 1
    def __getitem__(self, text: str) -> np.ndarray:
25 1
        try:    return self.w2v_model.wv[text]
1 ignored issue
show
Coding Style introduced by
More than one statement on a single line
Loading history...
Coding Style introduced by
Exactly one space required after :
Loading history...
26 1
        except: return np.array([0 for _ in range(0, self.w2v_model.vector_size)])
0 ignored issues
show
Coding Style introduced by
More than one statement on a single line
Loading history...
Coding Style Best Practice introduced by
General except handlers without types should be used sparingly.

Typically, you would use general except handlers when you intend to specifically handle all types of errors, f.e. when logging. Otherwise, such general error handlers can mask errors in your application that you want to know of.

Loading history...
27
28 1
    def tf_idf_transformer(self, text_series):
0 ignored issues
show
Coding Style introduced by
This method could be written as a function/class method.

If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example

class Foo:
    def some_method(self, x, y):
        return x + y;

could be written as

class Foo:
    @classmethod
    def some_method(cls, x, y):
        return x + y;
Loading history...
introduced by
Missing function or method docstring
Loading history...
29
        tfidf = Pipeline([('count', CountVectorizer(encoding='utf-8', min_df=3, #max_df=0.9,
30
                                                    max_features=900,
31
                                                    ngram_range=(1, 2))),
32
                          ('tfid', TfidfTransformer(sublinear_tf=True, norm='l2'))]).fit(text_series.ravel())
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (109/100).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
33
        return tfidf
34
35 1
    def encode(self, text: str) -> np.ndarray:
0 ignored issues
show
introduced by
Missing function or method docstring
Loading history...
36 1
        stream = utils.simple_preprocess(text)
37 1
        tf_idf_vec = self.tf_idf_transformation.transform(stream).toarray()
38 1
        w2v_encode = self[stream]
39 1
        return np.mean(list(self.tf_idf_mean(tf_idf_vec, w2v_encode)), axis=0)
40
41 1
    def save(self, path: str):
0 ignored issues
show
introduced by
Missing function or method docstring
Loading history...
42
        with open(path, 'wb') as f:
0 ignored issues
show
Coding Style Naming introduced by
Variable name "f" doesn't conform to snake_case naming style ('([^\\W\\dA-Z][^\\WA-Z]2,|_[^\\WA-Z]*|__[^\\WA-Z\\d_][^\\WA-Z]+__)$' pattern)

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
43
            pickle.dump(self, f, protocol=pickle.HIGHEST_PROTOCOL)
44
45 1
    def load(self, path: str):
0 ignored issues
show
introduced by
Missing function or method docstring
Loading history...
46 1
        with open(path, 'rb') as f:
0 ignored issues
show
Coding Style Naming introduced by
Variable name "f" doesn't conform to snake_case naming style ('([^\\W\\dA-Z][^\\WA-Z]2,|_[^\\WA-Z]*|__[^\\WA-Z\\d_][^\\WA-Z]+__)$' pattern)

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
47 1
            self.__dict__.update(pickle.load(f).__dict__)
48
49 1
    @staticmethod
50 1
    def tf_idf_mean(tf_idf_vec: np.ndarray, w2v_encode: np.ndarray):
0 ignored issues
show
introduced by
Missing function or method docstring
Loading history...
51 1
        for ind in range(len(tf_idf_vec)):
0 ignored issues
show
unused-code introduced by
Consider using enumerate instead of iterating with range and len
Loading history...
52
            yield tf_idf_vec[ind]*w2v_encode[ind]
53