tracking_policy_agendas.word2vec.w2v_emb - Code Metrics - Inspection of "Test and coverage" - MohammadForouhesh/tracking-policy-agendas - Measure and Improve Code Quality continuously with Scrutinizer

Test Failed

Pull Request — main (#5)

by Mohammad

created 2022-03-13 12:05 UTC

tracking_policy_agendas.word2vec.w2v_emb A

↳ Parent: Project

Complexity

Total Complexity

Size/Duplication

Total Lines	53
Duplicated Lines	79.25 %

Importance

Changes

Metric	Value
wmc	13
eloc	44
dl	42
loc	53
rs	10
c	0
b	0
f	0

8 Methods

Rating	Name	Duplication	Size	Complexity
A	W2VEmb.__init()	5	5	1
A	W2VEmb.__init__()	5	5	2
A	W2VEmb.tf_idf_transformer()	6	6	1
A	W2VEmb.__getitem__()	3	3	2
A	W2VEmb.tf_idf_mean()	4	4	2
A	W2VEmb.load()	3	3	2
A	W2VEmb.save()	3	3	2
A	W2VEmb.encode()	5	5	1

How to fix Duplicated Code

1		import pickle
		0 ignored issues – show introduced 2022-03-13 12:10 UTC by Report Bug Copy Issue Report Missing module docstring Loading history...
2		import gensim
3		import numpy as np
4		import pandas as pd
5		from gensim import utils
6		from sklearn.pipeline import Pipeline
7		from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer
8		from .w2v_corpus import W2VCorpus
9
10
11	View Code Duplication	class W2VEmb:
		0 ignored issues – show introduced 2022-03-13 12:10 UTC by Report Bug Copy Issue Report Missing class docstring Loading history... Duplication introduced 2022-03-13 12:10 UTC by Report Bug Copy Issue Report This code seems to be duplicated in your project. Loading history...
12		def __init__(self, text_document=None):
13		self.wv2_corpus = None
14		self.w2v_model = None
15		self.tf_idf_transformation = None
16		if text_document is not None: self.__init(text_document)
		0 ignored issues – show Coding Style introduced 2022-03-13 12:10 UTC by Report Bug Copy Issue Report More than one statement on a single line Loading history...
17
18		def __init(self, text_document: pd.Series):
19		text_document = text_document.fillna('')
20		self.tf_idf_transformation = self.tf_idf_transformer(text_document)
21		self.wv2_corpus = W2VCorpus(text_document)
22		self.w2v_model = gensim.models.Word2Vec(sentences=self.wv2_corpus, min_count=1, vector_size=900, epochs=50)
		0 ignored issues – show Coding Style introduced 2022-03-13 12:10 UTC by Report Bug Copy Issue Report This line is too long as per the coding-style (115/100). This check looks for lines that are too long. You can specify the maximum line length. Loading history...
23
24		def __getitem__(self, text: str) -> np.ndarray:
25		try: return self.w2v_model.wv[text]
		1 ignored issue – show Coding Style introduced 2022-03-13 12:10 UTC by Report Bug Copy Issue Report More than one statement on a single line Loading history... Coding Style introduced 2022-03-13 12:10 UTC by Report Bug Copy Issue Report Exactly one space required after : Loading history...
26		except: return np.array([0 for _ in range(0, self.w2v_model.vector_size)])
		0 ignored issues – show Coding Style introduced 2022-03-13 12:10 UTC by Report Bug Copy Issue Report More than one statement on a single line Loading history... Coding Style Best Practice introduced 2022-03-13 12:10 UTC by Report Bug Copy Issue Report General except handlers without types should be used sparingly. Typically, you would use general except handlers when you intend to specifically handle all types of errors, f.e. when logging. Otherwise, such general error handlers can mask errors in your application that you want to know of. Loading history...
27
28		def tf_idf_transformer(self, text_series):
		0 ignored issues – show Coding Style introduced 2022-03-13 12:10 UTC by Report Bug Copy Issue Report This method could be written as a function/class method. If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example class Foo: def some_method(self, x, y): return x + y; could be written as class Foo: @classmethod def some_method(cls, x, y): return x + y; Loading history... introduced 2022-03-13 12:10 UTC by Report Bug Copy Issue Report Missing function or method docstring Loading history...
29		tfidf = Pipeline([('count', CountVectorizer(encoding='utf-8', min_df=3, #max_df=0.9,
30		max_features=900,
31		ngram_range=(1, 2))),
32		('tfid', TfidfTransformer(sublinear_tf=True, norm='l2'))]).fit(text_series.ravel())
		0 ignored issues – show Coding Style introduced 2022-03-13 12:10 UTC by Report Bug Copy Issue Report This line is too long as per the coding-style (109/100). This check looks for lines that are too long. You can specify the maximum line length. Loading history...
33		return tfidf
34
35		def encode(self, text: str) -> np.ndarray:
		0 ignored issues – show introduced 2022-03-13 12:10 UTC by Report Bug Copy Issue Report Missing function or method docstring Loading history...
36		stream = utils.simple_preprocess(text)
37		tf_idf_vec = self.tf_idf_transformation.transform(stream).toarray()
38		w2v_encode = self[stream]
39		return np.mean(list(self.tf_idf_mean(tf_idf_vec, w2v_encode)), axis=0)
40
41		def save(self, path: str):
		0 ignored issues – show introduced 2022-03-13 12:10 UTC by Report Bug Copy Issue Report Missing function or method docstring Loading history...
42		with open(path, 'wb') as f:
		0 ignored issues – show Coding Style Naming introduced 2022-03-13 12:10 UTC by Report Bug Copy Issue Report Variable name "f" doesn't conform to snake_case naming style ('([^\\W\\dA-Z][^\\WA-Z]2,\|_[^\\WA-Z]*\|__[^\\WA-Z\\d_][^\\WA-Z]+__)$' pattern) This check looks for invalid names for a range of different identifiers. You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements. If your project includes a Pylint configuration file, the settings contained in that file take precedence. To find out more about Pylint, please refer to their site. Loading history...
43		pickle.dump(self, f, protocol=pickle.HIGHEST_PROTOCOL)
44
45		def load(self, path: str):
		0 ignored issues – show introduced 2022-03-13 12:10 UTC by Report Bug Copy Issue Report Missing function or method docstring Loading history...
46		with open(path, 'rb') as f:
		0 ignored issues – show Coding Style Naming introduced 2022-03-13 12:10 UTC by Report Bug Copy Issue Report Variable name "f" doesn't conform to snake_case naming style ('([^\\W\\dA-Z][^\\WA-Z]2,\|_[^\\WA-Z]*\|__[^\\WA-Z\\d_][^\\WA-Z]+__)$' pattern) This check looks for invalid names for a range of different identifiers. You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements. If your project includes a Pylint configuration file, the settings contained in that file take precedence. To find out more about Pylint, please refer to their site. Loading history...
47		self.__dict__.update(pickle.load(f).__dict__)
48
49		@staticmethod
50		def tf_idf_mean(tf_idf_vec: np.ndarray, w2v_encode: np.ndarray):
		0 ignored issues – show introduced 2022-03-13 12:10 UTC by Report Bug Copy Issue Report Missing function or method docstring Loading history...
51		for ind in range(len(tf_idf_vec)):
		0 ignored issues – show unused-code introduced 2022-03-13 12:10 UTC by Report Bug Copy Issue Report Consider using enumerate instead of iterating with range and len Loading history...
52		yield tf_idf_vec[ind]*w2v_encode[ind]
53

MohammadForouhesh / tracking-policy-agendas

Pull Request — main (#5)

tracking_policy_agendas.word2vec.w2v_emb A

Complexity

Size/Duplication

Importance

8 Methods

How to fix Duplicated Code

Duplicated Code

Duplication Side-by-Side

Filter issues like