ChEMBLConverter - Code Metrics - richlewis42/scikit-chem - Measure and Improve Code Quality continuously with Scrutinizer

ChEMBLConverter A
last analyzed 2016-09-01 14:43 UTC

↳ Parent: Project

Complexity

Total Complexity

Size/Duplication

Total Lines	44
Duplicated Lines	0 %

Test Coverage

Coverage

17.65%

Importance

Changes	2
Bugs	0	Features	1

Metric	Value
wmc	2
c	2
b	0
f	1
dl	0
loc	44
ccs	3
cts	17
cp	0.1765
rs	10

2 Methods

Rating	Name	Duplication	Size	Complexity
B	__init__()	0	33	1
A	parse_infile()	0	5	1

#! /usr/bin/env python
#
# Copyright (C) 2016 Rich Lewis <[email protected]>
# License: 3-clause BSD

"""
# skchem.data.converters.chembl

Dataset constructor for ChEMBL
"""
import logging
import pandas as pd
# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
import os

from .base import Converter, default_pipeline, contiguous_order, Feature
from ...cross_validation import SimThresholdSplit
from ... import features

LOGGER = logging.getLogger(__name__)


class ChEMBLConverter(Converter):

    """ Converter for the ChEMBL dataset. """

    def __init__(self, directory, output_directory, output_filename='chembl.h5'):
class SomeParent:
    def __init__(self):
        self.x = 1

class SomeChild(SomeParent):
    def __init__(self):
        # Initialize the super class
        SomeParent.__init__(self)

        output_path = os.path.join(output_directory, output_filename)

        infile = os.path.join(directory, 'chembl_raw.h5')
        ms, y = self.parse_infile(infile)


        pipeline = default_pipeline()

        ms, y = pipeline.transform_filter(ms, y)


        cv = SimThresholdSplit(min_threshold=0.6, n_jobs=-1).fit(ms)

        train, valid, test = cv.split((70, 15, 15))
        (ms, y, train, valid, test) = contiguous_order((ms, y, train, valid, test), (train, valid, test))

        splits = (('train', train), ('valid', valid), ('test', test))

        feats = (
        Feature(fper=features.MorganFeaturizer(),

                key='X_morg',
                axis_names=['batch', 'features']),
        Feature(fper=features.PhysicochemicalFeaturizer(),

                key='X_pc',
                axis_names=['batch', 'features']),
        Feature(fper=features.AtomFeaturizer(max_atoms=100),

                key='A',
                axis_names=['batch', 'atom_idx', 'features']),
        Feature(fper=features.GraphDistanceTransformer(max_atoms=100),

                key='G',
                axis_names=['batch', 'atom_idx', 'atom_idx']),
        Feature(fper=features.SpacialDistanceTransformer(max_atoms=100),

                key='G_d'))

        self.run(ms, y, output_path, features=feats, splits=splits)


    def parse_infile(self, filename):
class SomeClass:
    def some_method(self):
        """Do x and return foo."""

        ms = pd.read_hdf(filename, 'structure')

        y = pd.read_hdf(filename, 'targets/Y')

        return ms, y

if __name__ == '__main__':
    logging.basicConfig(level=logging.DEBUG)
    LOGGER.info('Converting ChEMBL...')
    ChEMBLConverter.convert()


1		#! /usr/bin/env python
2		#
3		# Copyright (C) 2016 Rich Lewis <[email protected]>
4		# License: 3-clause BSD
5
6	1	"""
7		# skchem.data.converters.chembl
8
9		Dataset constructor for ChEMBL
10		"""
11	1	import logging
12	1	import pandas as pd
		0 ignored issues – show Configuration introduced 2016-08-15 13:30 UTC by Report Bug Copy Issue Report The import `pandas` could not be resolved. This can be caused by one of the following: 1. Missing Dependencies This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands. # .scrutinizer.yml before_commands: - sudo pip install abc # Python2 - sudo pip3 install abc # Python3 Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version. 2. Missing __init__.py files This error could also result from missing `__init__.py` files in your module folders. Make sure that you place one file in each sub-folder. Loading history...
13	1	import os
14
15	1	from .base import Converter, default_pipeline, contiguous_order, Feature
16	1	from ...cross_validation import SimThresholdSplit
17	1	from ... import features
18
19	1	LOGGER = logging.getLogger(__name__)
20
21
22	1	class ChEMBLConverter(Converter):
23
24		""" Converter for the ChEMBL dataset. """
25
26	1	def __init__(self, directory, output_directory, output_filename='chembl.h5'):
		0 ignored issues – show Bug introduced 2016-08-15 13:30 UTC by Report Bug Copy Issue Report The `__init__` method of the super-class `Converter` is not called. It is generally advisable to initialize the super-class by calling its `__init__` method: class SomeParent: def __init__(self): self.x = 1 class SomeChild(SomeParent): def __init__(self): # Initialize the super class SomeParent.__init__(self) Loading history...
27
28		output_path = os.path.join(output_directory, output_filename)
29
30		infile = os.path.join(directory, 'chembl_raw.h5')
31		ms, y = self.parse_infile(infile)
		0 ignored issues – show Coding Style Naming introduced 2016-08-15 13:30 UTC by Report Bug Copy Issue Report The name `ms` does not conform to the variable naming conventions (`[a-z_][a-z0-9_]{2,30}$`). This check looks for invalid names for a range of different identifiers. You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements. If your project includes a Pylint configuration file, the settings contained in that file take precedence. To find out more about Pylint, please refer to their site. Loading history... Coding Style Naming introduced 2016-08-15 13:30 UTC by Report Bug Copy Issue Report The name `y` does not conform to the variable naming conventions (`[a-z_][a-z0-9_]{2,30}$`). This check looks for invalid names for a range of different identifiers. You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements. If your project includes a Pylint configuration file, the settings contained in that file take precedence. To find out more about Pylint, please refer to their site. Loading history...
32
33		pipeline = default_pipeline()
34
35		ms, y = pipeline.transform_filter(ms, y)
		0 ignored issues – show Coding Style Naming introduced 2016-08-15 13:30 UTC by Report Bug Copy Issue Report The name `ms` does not conform to the variable naming conventions (`[a-z_][a-z0-9_]{2,30}$`). This check looks for invalid names for a range of different identifiers. You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements. If your project includes a Pylint configuration file, the settings contained in that file take precedence. To find out more about Pylint, please refer to their site. Loading history... Coding Style Naming introduced 2016-08-15 13:30 UTC by Report Bug Copy Issue Report The name `y` does not conform to the variable naming conventions (`[a-z_][a-z0-9_]{2,30}$`). This check looks for invalid names for a range of different identifiers. You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements. If your project includes a Pylint configuration file, the settings contained in that file take precedence. To find out more about Pylint, please refer to their site. Loading history...
36
37		cv = SimThresholdSplit(min_threshold=0.6, n_jobs=-1).fit(ms)
		0 ignored issues – show Coding Style Naming introduced 2016-08-15 13:30 UTC by Report Bug Copy Issue Report The name `cv` does not conform to the variable naming conventions (`[a-z_][a-z0-9_]{2,30}$`). This check looks for invalid names for a range of different identifiers. You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements. If your project includes a Pylint configuration file, the settings contained in that file take precedence. To find out more about Pylint, please refer to their site. Loading history...
38		train, valid, test = cv.split((70, 15, 15))
39		(ms, y, train, valid, test) = contiguous_order((ms, y, train, valid, test), (train, valid, test))
		0 ignored issues – show Coding Style introduced 2016-08-15 13:30 UTC by Report Bug Copy Issue Report This line is too long as per the coding-style (105/100). This check looks for lines that are too long. You can specify the maximum line length. Loading history... Coding Style Naming introduced 2016-08-15 13:30 UTC by Report Bug Copy Issue Report The name `ms` does not conform to the variable naming conventions (`[a-z_][a-z0-9_]{2,30}$`). This check looks for invalid names for a range of different identifiers. You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements. If your project includes a Pylint configuration file, the settings contained in that file take precedence. To find out more about Pylint, please refer to their site. Loading history... Coding Style Naming introduced 2016-08-15 13:30 UTC by Report Bug Copy Issue Report The name `y` does not conform to the variable naming conventions (`[a-z_][a-z0-9_]{2,30}$`). This check looks for invalid names for a range of different identifiers. You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements. If your project includes a Pylint configuration file, the settings contained in that file take precedence. To find out more about Pylint, please refer to their site. Loading history...
40		splits = (('train', train), ('valid', valid), ('test', test))
41
42		feats = (
43		Feature(fper=features.MorganFeaturizer(),
		0 ignored issues – show Coding Style introduced 2016-08-25 16:29 UTC by Report Bug Copy Issue Report Wrong hanging indentation. Feature(fper=features.MorganFeaturizer(), ^ \| Loading history...
44		key='X_morg',
45		axis_names=['batch', 'features']),
46		Feature(fper=features.PhysicochemicalFeaturizer(),
		0 ignored issues – show Coding Style introduced 2016-08-25 16:29 UTC by Report Bug Copy Issue Report Wrong hanging indentation. Feature(fper=features.PhysicochemicalFeaturizer(), ^ \| Loading history...
47		key='X_pc',
48		axis_names=['batch', 'features']),
49		Feature(fper=features.AtomFeaturizer(max_atoms=100),
		0 ignored issues – show Coding Style introduced 2016-08-25 16:29 UTC by Report Bug Copy Issue Report Wrong hanging indentation. Feature(fper=features.AtomFeaturizer(max_atoms=100), ^ \| Loading history... Bug introduced 2016-08-16 15:32 UTC by Report Bug Copy Issue Report The keyword `max_atoms` does not seem to exist for the constructor call. Loading history...
50		key='A',
51		axis_names=['batch', 'atom_idx', 'features']),
52		Feature(fper=features.GraphDistanceTransformer(max_atoms=100),
		0 ignored issues – show Coding Style introduced 2016-08-25 16:29 UTC by Report Bug Copy Issue Report Wrong hanging indentation. Feature(fper=features.GraphDistanceTransformer(max_atoms=100), ^ \| Loading history... Bug introduced 2016-08-16 15:32 UTC by Report Bug Copy Issue Report The keyword `max_atoms` does not seem to exist for the constructor call. Loading history...
53		key='G',
54		axis_names=['batch', 'atom_idx', 'atom_idx']),
55		Feature(fper=features.SpacialDistanceTransformer(max_atoms=100),
		0 ignored issues – show Coding Style introduced 2016-08-25 16:29 UTC by Report Bug Copy Issue Report Wrong hanging indentation. Feature(fper=features.SpacialDistanceTransformer(max_atoms=100), ^ \| Loading history... Bug introduced 2016-08-16 15:32 UTC by Report Bug Copy Issue Report The keyword `max_atoms` does not seem to exist for the constructor call. Loading history...
56		key='G_d'))
57
58		self.run(ms, y, output_path, features=feats, splits=splits)
59
60
61	1	def parse_infile(self, filename):
		0 ignored issues – show Coding Style introduced 2016-08-15 13:30 UTC by Report Bug Copy Issue Report This method should have a docstring. The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods: class SomeClass: def some_method(self): """Do x and return foo.""" If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions. Loading history... Coding Style introduced 2016-08-15 13:30 UTC by Report Bug Copy Issue Report This method could be written as a function/class method. If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example class Foo: def some_method(self, x, y): return x + y; could be written as class Foo: @classmethod def some_method(cls, x, y): return x + y; Loading history...
62
63		ms = pd.read_hdf(filename, 'structure')
		0 ignored issues – show Coding Style Naming introduced 2016-08-15 13:30 UTC by Report Bug Copy Issue Report The name `ms` does not conform to the variable naming conventions (`[a-z_][a-z0-9_]{2,30}$`). This check looks for invalid names for a range of different identifiers. You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements. If your project includes a Pylint configuration file, the settings contained in that file take precedence. To find out more about Pylint, please refer to their site. Loading history...
64		y = pd.read_hdf(filename, 'targets/Y')
		0 ignored issues – show Coding Style Naming introduced 2016-08-15 13:30 UTC by Report Bug Copy Issue Report The name `y` does not conform to the variable naming conventions (`[a-z_][a-z0-9_]{2,30}$`). This check looks for invalid names for a range of different identifiers. You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements. If your project includes a Pylint configuration file, the settings contained in that file take precedence. To find out more about Pylint, please refer to their site. Loading history...
65		return ms, y
66
67	1	if __name__ == '__main__':
68		logging.basicConfig(level=logging.DEBUG)
69		LOGGER.info('Converting ChEMBL...')
70		ChEMBLConverter.convert()
71

richlewis42 / scikit-chem

ChEMBLConverter A last analyzed 2016-09-01 14:43 UTC

Complexity

Size/Duplication

Test Coverage

Importance

2 Methods

1. Missing Dependencies

2. Missing __init__.py files

Duplication Side-by-Side

Filter issues like

ChEMBLConverter A
last analyzed 2016-09-01 14:43 UTC

2. Missing init.py files