ChemAxonStandardizer.transform() - Code Metrics - Inspection of "made atom descriptors take 75 atoms by default" - richlewis42/scikit-chem - Measure and Improve Code Quality continuously with Scrutinizer

Completed

Push — master ( e4a84f...d41f63 )

by Rich

created 2016-06-12 20:24 UTC

ChemAxonStandardizer.transform() B

↳ Parent: ChemAxonStandardizer

Complexity

Conditions

Size

Total Lines

Duplication

Lines	0
Ratio	0 %

Importance

Changes	2
Bugs	0	Features	1

Metric	Value
c	2
b	0
f	1
dl	0
loc	25
rs	8.5806
cc	4

#! /usr/bin/env python

#
# Copyright (C) 2016 Rich Lewis <[email protected]>
# License: 3-clause BSD

"""
## skchem.standardizers.chemaxon

Module wrapping ChemAxon Standardizer.  Must have standardizer installed and
license activated.
"""

import os
import re
from tempfile import NamedTemporaryFile
import subprocess
import logging

logger = logging.getLogger(__name__)


import numpy as np
# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
import pandas as pd
# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3

from .. import core
from .. import io

# ideally we will programatically build this file, but for now just use it.
DEFAULT_CONFIG = os.path.join(os.path.dirname(__file__), 'default_config.xml')

class ChemAxonStandardizer(object):

    """ Object wrapping the ChemAxon Standardizer, for standardizing molecules.

    Args:
        config_path (str):
            The path of the config_file. If None, use the default one.
    """
    def __init__(self, config_path=None, warn_on_fail=True, error_on_fail=False,
                    keep_failed=False):


        if not config_path:
            config_path = DEFAULT_CONFIG
        self.config_path = config_path
        self.keep_failed = keep_failed
        self.error_on_fail = error_on_fail
        self.warn_on_fail = warn_on_fail

    def transform(self, obj):

        """ Standardize compounds.

        Args:
            obj (str, skchem.Mol, pd.Series or pd.DataFrame):
                The object to standardize as either smiles as a string, Mol, or
                a series or dataframe of these. The object to standardize.

        Returns:
            skchem.Mol or pd.Series or pd.DataFrame:
                The standardized molecule, or molecules as a series or
                dataframe.
        """

        if isinstance(obj, core.Mol):
            return self._transform_mol(obj)
        elif isinstance(obj, pd.Series):
            return self._transform_ser(obj)
        elif isinstance(obj, pd.DataFrame):
            res = self._transform_ser(obj.structure)
            return res.to_frame(name='structure').join(obj.drop('structure', axis=1))

        else:
            raise NotImplementedError

    def _transform_mol(self, mol):
        mol = pd.DataFrame([mol], index=[mol.name], columns=['structure'])
        return self.transform(mol).structure.iloc[0]

    def _transform_mols(self, X, by='sdf'):


        with NamedTemporaryFile() as f_in, NamedTemporaryFile() as f_out:
            getattr(io, 'write_' + by)(X, f_in.name)
            errs = self._transform_file(f_in.name, f_out.name)
            out = io.read_sdf(f_out.name).structure
        return out, errs

    def _transform_smis(self, X):

        with NamedTemporaryFile() as f_in, NamedTemporaryFile() as f_out:
            X.to_csv(f_in.name, header=None, index=None)
            errs = self._transform_file(f_in.name, f_out.name)
            out = io.read_sdf(f_out.name).structure
        return out, errs

    def _transform_file(self, f_in, f_out):
        args = ['standardize', f_in,
                         '-c', self.config_path,

                         '-f', 'sdf',

                         '-o', f_out,

                         '--ignore-error']

        logger.debug('Running %s', ' '.join(args))
        sub = subprocess.Popen(args, stderr=subprocess.PIPE)
        errs = sub.stderr.read().decode('ascii')
        if len(errs):
            logger.debug('stderr from Standardizer: \n%s', errs)
            errs = errs.strip().split('\n')
            errs = [re.findall('No. ([0-9]+):', err) for err in errs]
            errs = [int(err[0]) - 1 for err in errs if len(err)]
        return errs

    def _transform_ser(self, X, y=None):


        # TODO: try using different serializations

        if isinstance(X.iloc[0], core.Mol):
            out, errs = self._transform_mols(X)
        elif isinstance(X.iloc[0], str):
            out, errs = self._transform_smis(X)
        if errs:
            out.index = X.index.delete(errs)
            for err in errs:
                err = X.index[err]
                if self.error_on_fail:
                    raise ValueError('{} failed to standardize'.format(err))
                if self.warn_on_fail:
                    logger.warn('%s failed to standardize', err)
        else:
            out.index = X.index
        if self.keep_failed:
            out_c = X.copy()
            out_c.loc[out.index] = out
            return out_c
        else:
            return out


1			#! /usr/bin/env python
			0 ignored issues – show Bug introduced 2016-06-12 20:26 UTC by Report Bug Copy Issue Report There seems to be a cyclic import (skchem.core.atom -> skchem.core.base). Cyclic imports may cause partly loaded modules to be returned. This might lead to unexpected runtime behavior which is hard to debug. Loading history... Bug introduced 2016-06-12 20:26 UTC by Report Bug Copy Issue Report There seems to be a cyclic import (skchem.core -> skchem.core.mol). Cyclic imports may cause partly loaded modules to be returned. This might lead to unexpected runtime behavior which is hard to debug. Loading history... Bug introduced 2016-06-12 20:26 UTC by Report Bug Copy Issue Report There seems to be a cyclic import (skchem -> skchem.descriptors -> skchem.descriptors.physicochemical -> skchem.descriptors.fingerprints). Cyclic imports may cause partly loaded modules to be returned. This might lead to unexpected runtime behavior which is hard to debug. Loading history... Bug introduced 2016-06-12 20:26 UTC by Report Bug Copy Issue Report There seems to be a cyclic import (skchem -> skchem.descriptors -> skchem.descriptors.fingerprints). Cyclic imports may cause partly loaded modules to be returned. This might lead to unexpected runtime behavior which is hard to debug. Loading history... Bug introduced 2016-06-12 20:26 UTC by Report Bug Copy Issue Report There seems to be a cyclic import (skchem.core -> skchem.core.bond). Cyclic imports may cause partly loaded modules to be returned. This might lead to unexpected runtime behavior which is hard to debug. Loading history... Bug introduced 2016-06-12 20:26 UTC by Report Bug Copy Issue Report There seems to be a cyclic import (skchem -> skchem.cross_validation -> skchem.cross_validation.similarity_threshold -> skchem.descriptors -> skchem.descriptors.physicochemical -> skchem.descriptors.fingerprints). Cyclic imports may cause partly loaded modules to be returned. This might lead to unexpected runtime behavior which is hard to debug. Loading history... Bug introduced 2016-06-12 20:26 UTC by Report Bug Copy Issue Report There seems to be a cyclic import (skchem -> skchem.target_prediction -> skchem.target_prediction.PIDGIN -> skchem.descriptors -> skchem.descriptors.physicochemical -> skchem.descriptors.fingerprints). Cyclic imports may cause partly loaded modules to be returned. This might lead to unexpected runtime behavior which is hard to debug. Loading history...
2			#
3			# Copyright (C) 2016 Rich Lewis <[email protected]>
4			# License: 3-clause BSD
5
6			"""
7			## skchem.standardizers.chemaxon
8
9			Module wrapping ChemAxon Standardizer. Must have standardizer installed and
10			license activated.
11			"""
12
13			import os
14			import re
15			from tempfile import NamedTemporaryFile
16			import subprocess
17			import logging
18
19			logger = logging.getLogger(__name__)
			0 ignored issues – show Coding Style Naming introduced 2016-06-07 13:27 UTC by Report Bug Copy Issue Report The name `logger` does not conform to the constant naming conventions (`(([A-Z_][A-Z0-9_])\|(__.__))$`). This check looks for invalid names for a range of different identifiers. You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements. If your project includes a Pylint configuration file, the settings contained in that file take precedence. To find out more about Pylint, please refer to their site. Loading history...
20
21			import numpy as np
			0 ignored issues – show Configuration introduced 2016-06-12 20:26 UTC by Report Bug Copy Issue Report The import `numpy` could not be resolved. This can be caused by one of the following: 1. Missing Dependencies This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands. # .scrutinizer.yml before_commands: - sudo pip install abc # Python2 - sudo pip3 install abc # Python3 Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version. 2. Missing __init__.py files This error could also result from missing `__init__.py` files in your module folders. Make sure that you place one file in each sub-folder. Loading history... Unused Code introduced 2016-06-12 20:26 UTC by Report Bug Copy Issue Report Unused numpy imported as np Loading history...
22			import pandas as pd
			0 ignored issues – show Configuration introduced 2016-06-07 13:27 UTC by Report Bug Copy Issue Report The import `pandas` could not be resolved. This can be caused by one of the following: 1. Missing Dependencies This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands. # .scrutinizer.yml before_commands: - sudo pip install abc # Python2 - sudo pip3 install abc # Python3 Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version. 2. Missing __init__.py files This error could also result from missing `__init__.py` files in your module folders. Make sure that you place one file in each sub-folder. Loading history...
23
24			from .. import core
25			from .. import io
26
27			# ideally we will programatically build this file, but for now just use it.
28			DEFAULT_CONFIG = os.path.join(os.path.dirname(__file__), 'default_config.xml')
29
30			class ChemAxonStandardizer(object):
31
32			""" Object wrapping the ChemAxon Standardizer, for standardizing molecules.
33
34			Args:
35			config_path (str):
36			The path of the config_file. If None, use the default one.
37			"""
38			def __init__(self, config_path=None, warn_on_fail=True, error_on_fail=False,
39			keep_failed=False):
			0 ignored issues – show Coding Style introduced 2016-06-12 20:26 UTC by Report Bug Copy Issue Report Wrong continued indentation. keep_failed=False): \| ^ Loading history...
40
41			if not config_path:
42			config_path = DEFAULT_CONFIG
43			self.config_path = config_path
44			self.keep_failed = keep_failed
45			self.error_on_fail = error_on_fail
46			self.warn_on_fail = warn_on_fail
47
48			def transform(self, obj):
49
50			""" Standardize compounds.
51
52			Args:
53			obj (str, skchem.Mol, pd.Series or pd.DataFrame):
54			The object to standardize as either smiles as a string, Mol, or
55			a series or dataframe of these. The object to standardize.
56
57			Returns:
58			skchem.Mol or pd.Series or pd.DataFrame:
59			The standardized molecule, or molecules as a series or
60			dataframe.
61			"""
62
63			if isinstance(obj, core.Mol):
64			return self._transform_mol(obj)
65			elif isinstance(obj, pd.Series):
66			return self._transform_ser(obj)
67			elif isinstance(obj, pd.DataFrame):
68			res = self._transform_ser(obj.structure)
69			return res.to_frame(name='structure').join(obj.drop('structure', axis=1))
70
71			else:
72			raise NotImplementedError
73
74			def _transform_mol(self, mol):
75			mol = pd.DataFrame([mol], index=[mol.name], columns=['structure'])
76			return self.transform(mol).structure.iloc[0]
77
78			def _transform_mols(self, X, by='sdf'):
			0 ignored issues – show Coding Style Naming introduced 2016-06-07 13:27 UTC by Report Bug Copy Issue Report The name `X` does not conform to the argument naming conventions (`[a-z_][a-z0-9_]{2,30}$`). This check looks for invalid names for a range of different identifiers. You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements. If your project includes a Pylint configuration file, the settings contained in that file take precedence. To find out more about Pylint, please refer to their site. Loading history... Coding Style Naming introduced 2016-06-07 13:27 UTC by Report Bug Copy Issue Report The name `by` does not conform to the argument naming conventions (`[a-z_][a-z0-9_]{2,30}$`). This check looks for invalid names for a range of different identifiers. You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements. If your project includes a Pylint configuration file, the settings contained in that file take precedence. To find out more about Pylint, please refer to their site. Loading history...
79
80			with NamedTemporaryFile() as f_in, NamedTemporaryFile() as f_out:
81			getattr(io, 'write_' + by)(X, f_in.name)
82			errs = self._transform_file(f_in.name, f_out.name)
83			out = io.read_sdf(f_out.name).structure
84			return out, errs
85
86			def _transform_smis(self, X):
			0 ignored issues – show Coding Style Naming introduced 2016-06-12 20:26 UTC by Report Bug Copy Issue Report The name `X` does not conform to the argument naming conventions (`[a-z_][a-z0-9_]{2,30}$`). This check looks for invalid names for a range of different identifiers. You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements. If your project includes a Pylint configuration file, the settings contained in that file take precedence. To find out more about Pylint, please refer to their site. Loading history...
87			with NamedTemporaryFile() as f_in, NamedTemporaryFile() as f_out:
88			X.to_csv(f_in.name, header=None, index=None)
89			errs = self._transform_file(f_in.name, f_out.name)
90			out = io.read_sdf(f_out.name).structure
91			return out, errs
92
93			def _transform_file(self, f_in, f_out):
94			args = ['standardize', f_in,
95			'-c', self.config_path,
			0 ignored issues – show Coding Style introduced 2016-06-12 20:26 UTC by Report Bug Copy Issue Report Wrong continued indentation. '-c', self.config_path, \| ^ Loading history...
96			'-f', 'sdf',
			0 ignored issues – show Coding Style introduced 2016-06-12 20:26 UTC by Report Bug Copy Issue Report Wrong continued indentation. '-f', 'sdf', \| ^ Loading history...
97			'-o', f_out,
			0 ignored issues – show Coding Style introduced 2016-06-12 20:26 UTC by Report Bug Copy Issue Report Wrong continued indentation. '-o', f_out, \| ^ Loading history...
98			'--ignore-error']
			0 ignored issues – show Coding Style introduced 2016-06-12 20:26 UTC by Report Bug Copy Issue Report Wrong continued indentation. '--ignore-error'] \| ^ Loading history...
99			logger.debug('Running %s', ' '.join(args))
100			sub = subprocess.Popen(args, stderr=subprocess.PIPE)
101			errs = sub.stderr.read().decode('ascii')
102			if len(errs):
103			logger.debug('stderr from Standardizer: \n%s', errs)
104			errs = errs.strip().split('\n')
105			errs = [re.findall('No. ([0-9]+):', err) for err in errs]
106			errs = [int(err[0]) - 1 for err in errs if len(err)]
107			return errs
108
109			def _transform_ser(self, X, y=None):
			0 ignored issues – show Coding Style Naming introduced 2016-06-07 13:27 UTC by Report Bug Copy Issue Report The name `X` does not conform to the argument naming conventions (`[a-z_][a-z0-9_]{2,30}$`). This check looks for invalid names for a range of different identifiers. You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements. If your project includes a Pylint configuration file, the settings contained in that file take precedence. To find out more about Pylint, please refer to their site. Loading history... Coding Style Naming introduced 2016-06-07 13:27 UTC by Report Bug Copy Issue Report The name `y` does not conform to the argument naming conventions (`[a-z_][a-z0-9_]{2,30}$`). This check looks for invalid names for a range of different identifiers. You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements. If your project includes a Pylint configuration file, the settings contained in that file take precedence. To find out more about Pylint, please refer to their site. Loading history... Unused Code introduced 2016-06-07 13:27 UTC by Report Bug Copy Issue Report The argument `y` seems to be unused. Loading history...
110
111			# TODO: try using different serializations
			0 ignored issues – show Coding Style introduced 2016-06-12 20:26 UTC by Report Bug Copy Issue Report `TODO` and `FIXME` comments should generally be avoided. Loading history...
112			if isinstance(X.iloc[0], core.Mol):
113			out, errs = self._transform_mols(X)
114			elif isinstance(X.iloc[0], str):
115			out, errs = self._transform_smis(X)
116			if errs:
117			out.index = X.index.delete(errs)
118			for err in errs:
119			err = X.index[err]
120			if self.error_on_fail:
121			raise ValueError('{} failed to standardize'.format(err))
122			if self.warn_on_fail:
123			logger.warn('%s failed to standardize', err)
124			else:
125			out.index = X.index
126			if self.keep_failed:
127			out_c = X.copy()
128			out_c.loc[out.index] = out
129			return out_c
130			else:
131			return out
132

Push — master ( e4a84f...d41f63 )

ChemAxonStandardizer.transform() B

Complexity

Size

Duplication

Importance

1. Missing Dependencies

2. Missing init.py files

1. Missing Dependencies

2. Missing init.py files

richlewis42 / scikit-chem

Push — master ( e4a84f...d41f63 )

ChemAxonStandardizer.transform() B

Complexity

Size

Duplication

Importance

1. Missing Dependencies

2. Missing __init__.py files

1. Missing Dependencies

2. Missing __init__.py files

Duplication Side-by-Side

Filter issues like

2. Missing init.py files

2. Missing init.py files