skchem.target_prediction.PIDGIN - Code Metrics - Inspection of "updated pylintrc" - richlewis42/scikit-chem - Measure and Improve Code Quality continuously with Scrutinizer

Completed

Push — master ( 2bc047...202252 )

by Rich

created 2016-04-14 16:47 UTC

skchem.target_prediction.PIDGIN A

↳ Parent: Project

Complexity

Total Complexity

Size/Duplication

Total Lines	135
Duplicated Lines	0 %

Metric	Value
dl	0
loc	135
rs	10
wmc	27

14 Methods

Rating	Name	Duplication	Size	Complexity
A	PIDGIN.__init__()	0	9	2
A	PIDGIN._m_predict()	0	7	2
A	PIDGIN._map_predict_proba()	0	7	2
A	PIDGIN._df_map_predict_proba()	0	9	2
A	PIDGIN.map_predict_log_proba()	0	8	2
A	PIDGIN._df_predict_log_proba()	10	10	2
A	PIDGIN._map_predict()	0	7	2
A	PIDGIN._df_predict()	0	9	2
A	PIDGIN._df_map_predict_log_proba()	0	11	2
A	PIDGIN._df_map_predict()	0	9	2
A	PIDGIN._m_predict_log_proba()	0	9	2
A	PIDGIN.__call__()	0	2	1
A	PIDGIN._m_predict_proba()	0	9	2
A	PIDGIN._df_predict_proba()	11	11	2

#! /usr/bin/env python
#
# Copyright (C) 2007-2009 Rich Lewis <[email protected]>
# License: 3-clause BSD

# The map functions are a stand in before parallelism is applied,
# so ignore the errors for using map + lambdas.

# pylint: disable=W0110


"""
skchem.target_prediction.PIDGIN

Wrapper for the PIDGIN models.
"""

import pandas as pd
# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
from rdkit.Chem.rdMolDescriptors import GetMorganFingerprintAsBitVect
# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
import gzip

import sys

# if cpickle available, import it.  otherwise use pickle
try:
    import cPickle as pickle
except ImportError:
    import pickle

from .target_prediction import AbstractTargetPredictionAlgorithm
from ..descriptors import skchemize
from ..data import resource

class PIDGIN(AbstractTargetPredictionAlgorithm):

    """ Class implementing the PIDGIN target prediction algorithm """

    def __init__(self):
        # fix py3 incompat by creating py2k and py3k PIDGIN models
        filename = 'models_{}{}.pkl.gz'.format(*sys.version_info[:2])

        with gzip.open(resource('PIDGIN', filename), 'rb') as f:

            self.models = pickle.load(f)
        self.fingerprint = skchemize(GetMorganFingerprintAsBitVect, \
                                        radius=2, nBits=2048)
        self.targets = self.models.keys()

    def __call__(self, m):
        return self.predict_proba(m)

    def _m_predict(self, m):

        """ Predict binary binding profile for a molecule against 1080 protein targets """

        fp = self.fingerprint(m)

        return pd.Series((self.models[targ].predict(fp)[0] for targ in self.targets), \
                            index=self.targets)

    def _map_predict(self, m):


        """ Map based prediction for binary binding profile """

        fp = self.fingerprint(m)

        return pd.Series(map(lambda k: self.models[k].predict(fp), self.targets), \

                                                index=self.targets)

    def _m_predict_proba(self, m):

        """ Predict probability of molecule m binding to 1080 protein targets """

        fp = self.fingerprint(m)

        res = pd.Series(index=self.targets)
        for target in self.models:
            res[target] = self.models[target].predict_proba(fp)[:, 1][0]
        return res

    def _map_predict_proba(self, m):


        """ Predict the log probability of molecule m binding to the 1080 proteins """

        fp = self.fingerprint(m)

        return pd.Series(map(lambda k: self.models[k].predict_proba(fp)[:, 1][0],\

                                                self.targets), index=self.targets)

    def _m_predict_log_proba(self, m):

        """ Predict the log probability of molecule m binding to the 1080 proteins """

        fp = self.fingerprint(m)

        res = pd.Series(index=self.targets)
        for target, model in self.models.iteritems():
            res[target] = model.predict_log_proba(fp)[:, 1][0]
        return res

    def map_predict_log_proba(self, m):


        """ Predict the log probabiltiy of molecule m binding to the 1080 proteins
        using map, for simple parallelism """

        fp = self.fingerprint(m)

        return pd.Series(map(lambda k: self.models[k].predict_log_proba(fp[:, 1][0]), \

                                                self.targets), index=self.targets)

    def _df_predict(self, df):

        """more efficient way to call the predict on large scikit-chem style dataframes"""

        fps = df.structure.apply(self.fingerprint)
        res = pd.DataFrame(index=fps.index, columns=self.targets)
        for target in self.models:
            res[target] = self.models[target].predict(fps)
        return res

    def _df_map_predict(self, df):


        """ More efficient way to call the predict on large scikit-chem style dataframes,
        with a map implementation for easy parallelism"""

        fps = df.structure.apply(self.fingerprint)

        return pd.DataFrame(map(lambda k: self.models[k].predict(fps), self.targets), \

                                        columns=fps.index, index=self.targets).T


    def _df_predict_proba(self, df):


        """ More efficient way to call the predict_proba on large scikit-chem style dataframes"""

        fps = df.structure.apply(self.fingerprint)
        res = pd.DataFrame(index=fps.index, columns=self.targets)

        #parallelize here
        for target in self.models:
            res[target] = self.models[target].predict_proba(fps)[:, 1]
        return res

    def _df_map_predict_proba(self, df):


        """ map based way to call the predict_proba on large scikit-chem style dataframes """

        fps = df.structure.apply(self.fingerprint)

        #parallize here trivially
        return pd.DataFrame(map(lambda k: self.models[k].predict_proba(fps)[:, 1], self.targets), \

                                            columns=fps.index, index=self.targets).T

    def _df_predict_log_proba(self, df):


        """ More efficient way to call the predict_proba on large scikit-chem style dataframes"""

        fps = df.structure.apply(self.fingerprint)
        res = pd.DataFrame(index=fps.index, columns=self.targets)

        for target in self.models:
            res[target] = self.models[target].predict_log_proba(fps)[:, 1]
        return res

    def _df_map_predict_log_proba(self, df):


        """
        More efficient way to call the predict on large scikit-chem style dataframes,
        with a map implementation for easy parallelism
        """

        fps = df.structure.apply(self.fingerprint)

        return pd.DataFrame(map(lambda k: self.models[k].predict_log_proba(fps)[:, 1], \

                                        self.targets), columns=fps.index, index=self.targets).T


1		#! /usr/bin/env python
2		#
3		# Copyright (C) 2007-2009 Rich Lewis <[email protected]>
4		# License: 3-clause BSD
5
6		# The map functions are a stand in before parallelism is applied,
7		# so ignore the errors for using map + lambdas.
8
9		# pylint: disable=W0110
		0 ignored issues – show introduced 2016-01-19 16:21 UTC by Report Bug Copy Issue Report Locally disabling deprecated-lambda (W0110) Loading history...
10
11		"""
12		skchem.target_prediction.PIDGIN
13
14		Wrapper for the PIDGIN models.
15		"""
16
17		import pandas as pd
		0 ignored issues – show Configuration introduced 2016-01-19 16:21 UTC by Report Bug Copy Issue Report The import `pandas` could not be resolved. This can be caused by one of the following: 1. Missing Dependencies This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands. # .scrutinizer.yml before_commands: - sudo pip install abc # Python2 - sudo pip3 install abc # Python3 Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version. 2. Missing __init__.py files This error could also result from missing `__init__.py` files in your module folders. Make sure that you place one file in each sub-folder. Loading history...
18		from rdkit.Chem.rdMolDescriptors import GetMorganFingerprintAsBitVect
		0 ignored issues – show Configuration introduced 2016-01-19 16:21 UTC by Report Bug Copy Issue Report The import `rdkit.Chem.rdMolDescriptors` could not be resolved. This can be caused by one of the following: 1. Missing Dependencies This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands. # .scrutinizer.yml before_commands: - sudo pip install abc # Python2 - sudo pip3 install abc # Python3 Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version. 2. Missing __init__.py files This error could also result from missing `__init__.py` files in your module folders. Make sure that you place one file in each sub-folder. Loading history...
19		import gzip
20
21		import sys
22
23		# if cpickle available, import it. otherwise use pickle
24		try:
25		import cPickle as pickle
26		except ImportError:
27		import pickle
28
29		from .target_prediction import AbstractTargetPredictionAlgorithm
30		from ..descriptors import skchemize
31		from ..data import resource
32
33		class PIDGIN(AbstractTargetPredictionAlgorithm):
34
35		""" Class implementing the PIDGIN target prediction algorithm """
36
37		def __init__(self):
38		# fix py3 incompat by creating py2k and py3k PIDGIN models
39		filename = 'models_{}{}.pkl.gz'.format(*sys.version_info[:2])
40
41		with gzip.open(resource('PIDGIN', filename), 'rb') as f:
		0 ignored issues – show Coding Style Naming introduced 2016-04-14 16:48 UTC by Report Bug Copy Issue Report The name `f` does not conform to the variable naming conventions (`[a-z_][a-z0-9_]{2,30}$`). This check looks for invalid names for a range of different identifiers. You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements. If your project includes a Pylint configuration file, the settings contained in that file take precedence. To find out more about Pylint, please refer to their site. Loading history...
42		self.models = pickle.load(f)
43		self.fingerprint = skchemize(GetMorganFingerprintAsBitVect, \
44		radius=2, nBits=2048)
45		self.targets = self.models.keys()
46
47		def __call__(self, m):
48		return self.predict_proba(m)
49
50		def _m_predict(self, m):
51
52		""" Predict binary binding profile for a molecule against 1080 protein targets """
53
54		fp = self.fingerprint(m)
		0 ignored issues – show Coding Style Naming introduced 2016-04-14 16:48 UTC by Report Bug Copy Issue Report The name `fp` does not conform to the variable naming conventions (`[a-z_][a-z0-9_]{2,30}$`). This check looks for invalid names for a range of different identifiers. You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements. If your project includes a Pylint configuration file, the settings contained in that file take precedence. To find out more about Pylint, please refer to their site. Loading history...
55		return pd.Series((self.models[targ].predict(fp)[0] for targ in self.targets), \
56		index=self.targets)
57
58		def _map_predict(self, m):
		0 ignored issues – show Coding Style Naming introduced 2016-04-14 16:48 UTC by Report Bug Copy Issue Report The name `m` does not conform to the argument naming conventions (`[a-z_][a-z0-9_]{2,30}$`). This check looks for invalid names for a range of different identifiers. You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements. If your project includes a Pylint configuration file, the settings contained in that file take precedence. To find out more about Pylint, please refer to their site. Loading history...
59
60		""" Map based prediction for binary binding profile """
61
62		fp = self.fingerprint(m)
		0 ignored issues – show Coding Style Naming introduced 2016-04-14 16:48 UTC by Report Bug Copy Issue Report The name `fp` does not conform to the variable naming conventions (`[a-z_][a-z0-9_]{2,30}$`). This check looks for invalid names for a range of different identifiers. You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements. If your project includes a Pylint configuration file, the settings contained in that file take precedence. To find out more about Pylint, please refer to their site. Loading history...
63		return pd.Series(map(lambda k: self.models[k].predict(fp), self.targets), \
		0 ignored issues – show introduced 2016-04-14 16:48 UTC by Report Bug Copy Issue Report Used builtin function 'map' Loading history...
64		index=self.targets)
65
66		def _m_predict_proba(self, m):
67
68		""" Predict probability of molecule m binding to 1080 protein targets """
69
70		fp = self.fingerprint(m)
		0 ignored issues – show Coding Style Naming introduced 2016-04-14 16:48 UTC by Report Bug Copy Issue Report The name `fp` does not conform to the variable naming conventions (`[a-z_][a-z0-9_]{2,30}$`). This check looks for invalid names for a range of different identifiers. You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements. If your project includes a Pylint configuration file, the settings contained in that file take precedence. To find out more about Pylint, please refer to their site. Loading history...
71		res = pd.Series(index=self.targets)
72		for target in self.models:
73		res[target] = self.models[target].predict_proba(fp)[:, 1][0]
74		return res
75
76		def _map_predict_proba(self, m):
		0 ignored issues – show Coding Style Naming introduced 2016-04-14 16:48 UTC by Report Bug Copy Issue Report The name `m` does not conform to the argument naming conventions (`[a-z_][a-z0-9_]{2,30}$`). This check looks for invalid names for a range of different identifiers. You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements. If your project includes a Pylint configuration file, the settings contained in that file take precedence. To find out more about Pylint, please refer to their site. Loading history...
77
78		""" Predict the log probability of molecule m binding to the 1080 proteins """
79
80		fp = self.fingerprint(m)
		0 ignored issues – show Coding Style Naming introduced 2016-04-14 16:48 UTC by Report Bug Copy Issue Report The name `fp` does not conform to the variable naming conventions (`[a-z_][a-z0-9_]{2,30}$`). This check looks for invalid names for a range of different identifiers. You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements. If your project includes a Pylint configuration file, the settings contained in that file take precedence. To find out more about Pylint, please refer to their site. Loading history...
81		return pd.Series(map(lambda k: self.models[k].predict_proba(fp)[:, 1][0],\
		0 ignored issues – show introduced 2016-04-14 16:48 UTC by Report Bug Copy Issue Report Used builtin function 'map' Loading history...
82		self.targets), index=self.targets)
83
84		def _m_predict_log_proba(self, m):
85
86		""" Predict the log probability of molecule m binding to the 1080 proteins """
87
88		fp = self.fingerprint(m)
		0 ignored issues – show Coding Style Naming introduced 2016-04-14 16:48 UTC by Report Bug Copy Issue Report The name `fp` does not conform to the variable naming conventions (`[a-z_][a-z0-9_]{2,30}$`). This check looks for invalid names for a range of different identifiers. You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements. If your project includes a Pylint configuration file, the settings contained in that file take precedence. To find out more about Pylint, please refer to their site. Loading history...
89		res = pd.Series(index=self.targets)
90		for target, model in self.models.iteritems():
91		res[target] = model.predict_log_proba(fp)[:, 1][0]
92		return res
93
94		def map_predict_log_proba(self, m):
		0 ignored issues – show Coding Style Naming introduced 2016-04-14 16:48 UTC by Report Bug Copy Issue Report The name `m` does not conform to the argument naming conventions (`[a-z_][a-z0-9_]{2,30}$`). This check looks for invalid names for a range of different identifiers. You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements. If your project includes a Pylint configuration file, the settings contained in that file take precedence. To find out more about Pylint, please refer to their site. Loading history...
95
96		""" Predict the log probabiltiy of molecule m binding to the 1080 proteins
97		using map, for simple parallelism """
98
99		fp = self.fingerprint(m)
		0 ignored issues – show Coding Style Naming introduced 2016-04-14 16:48 UTC by Report Bug Copy Issue Report The name `fp` does not conform to the variable naming conventions (`[a-z_][a-z0-9_]{2,30}$`). This check looks for invalid names for a range of different identifiers. You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements. If your project includes a Pylint configuration file, the settings contained in that file take precedence. To find out more about Pylint, please refer to their site. Loading history...
100		return pd.Series(map(lambda k: self.models[k].predict_log_proba(fp[:, 1][0]), \
		0 ignored issues – show introduced 2016-04-14 16:48 UTC by Report Bug Copy Issue Report Used builtin function 'map' Loading history...
101		self.targets), index=self.targets)
102
103		def _df_predict(self, df):
104
105		"""more efficient way to call the predict on large scikit-chem style dataframes"""
106
107		fps = df.structure.apply(self.fingerprint)
108		res = pd.DataFrame(index=fps.index, columns=self.targets)
109		for target in self.models:
110		res[target] = self.models[target].predict(fps)
111		return res
112
113		def _df_map_predict(self, df):
		0 ignored issues – show Coding Style Naming introduced 2016-04-14 16:48 UTC by Report Bug Copy Issue Report The name `df` does not conform to the argument naming conventions (`[a-z_][a-z0-9_]{2,30}$`). This check looks for invalid names for a range of different identifiers. You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements. If your project includes a Pylint configuration file, the settings contained in that file take precedence. To find out more about Pylint, please refer to their site. Loading history...
114
115		""" More efficient way to call the predict on large scikit-chem style dataframes,
116		with a map implementation for easy parallelism"""
117
118		fps = df.structure.apply(self.fingerprint)
119
120		return pd.DataFrame(map(lambda k: self.models[k].predict(fps), self.targets), \
		0 ignored issues – show introduced 2016-04-14 16:48 UTC by Report Bug Copy Issue Report Used builtin function 'map' Loading history...
121		columns=fps.index, index=self.targets).T
122
123
124	View Code Duplication	def _df_predict_proba(self, df):
		0 ignored issues – show Duplication introduced 2016-04-14 16:48 UTC by Report Bug Copy Issue Report This code seems to be duplicated in your project. Loading history...
125
126		""" More efficient way to call the predict_proba on large scikit-chem style dataframes"""
127
128		fps = df.structure.apply(self.fingerprint)
129		res = pd.DataFrame(index=fps.index, columns=self.targets)
130
131		#parallelize here
132		for target in self.models:
133		res[target] = self.models[target].predict_proba(fps)[:, 1]
134		return res
135
136		def _df_map_predict_proba(self, df):
		0 ignored issues – show Coding Style Naming introduced 2016-04-14 16:48 UTC by Report Bug Copy Issue Report The name `df` does not conform to the argument naming conventions (`[a-z_][a-z0-9_]{2,30}$`). This check looks for invalid names for a range of different identifiers. You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements. If your project includes a Pylint configuration file, the settings contained in that file take precedence. To find out more about Pylint, please refer to their site. Loading history...
137
138		""" map based way to call the predict_proba on large scikit-chem style dataframes """
139
140		fps = df.structure.apply(self.fingerprint)
141
142		#parallize here trivially
143		return pd.DataFrame(map(lambda k: self.models[k].predict_proba(fps)[:, 1], self.targets), \
		0 ignored issues – show introduced 2016-04-14 16:48 UTC by Report Bug Copy Issue Report Used builtin function 'map' Loading history...
144		columns=fps.index, index=self.targets).T
145
146	View Code Duplication	def _df_predict_log_proba(self, df):
		0 ignored issues – show Duplication introduced 2016-04-14 16:48 UTC by Report Bug Copy Issue Report This code seems to be duplicated in your project. Loading history...
147
148		""" More efficient way to call the predict_proba on large scikit-chem style dataframes"""
149
150		fps = df.structure.apply(self.fingerprint)
151		res = pd.DataFrame(index=fps.index, columns=self.targets)
152
153		for target in self.models:
154		res[target] = self.models[target].predict_log_proba(fps)[:, 1]
155		return res
156
157		def _df_map_predict_log_proba(self, df):
		0 ignored issues – show Coding Style Naming introduced 2016-04-14 16:48 UTC by Report Bug Copy Issue Report The name `df` does not conform to the argument naming conventions (`[a-z_][a-z0-9_]{2,30}$`). This check looks for invalid names for a range of different identifiers. You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements. If your project includes a Pylint configuration file, the settings contained in that file take precedence. To find out more about Pylint, please refer to their site. Loading history...
158
159		"""
160		More efficient way to call the predict on large scikit-chem style dataframes,
161		with a map implementation for easy parallelism
162		"""
163
164		fps = df.structure.apply(self.fingerprint)
165
166		return pd.DataFrame(map(lambda k: self.models[k].predict_log_proba(fps)[:, 1], \
		0 ignored issues – show introduced 2016-04-14 16:48 UTC by Report Bug Copy Issue Report Used builtin function 'map' Loading history...
167		self.targets), columns=fps.index, index=self.targets).T
168

Push — master ( 2bc047...202252 )

skchem.target_prediction.PIDGIN A

Complexity

Size/Duplication

14 Methods

1. Missing Dependencies

2. Missing init.py files

1. Missing Dependencies

2. Missing init.py files

richlewis42 / scikit-chem

Push — master ( 2bc047...202252 )

skchem.target_prediction.PIDGIN A

Complexity

Size/Duplication

14 Methods

1. Missing Dependencies

2. Missing __init__.py files

1. Missing Dependencies

2. Missing __init__.py files

Duplication Side-by-Side

Filter issues like

2. Missing init.py files

2. Missing init.py files