Completed
Push — master ( e4a84f...d41f63 )
by Rich
01:35
created

ChemAxonStandardizer.transform()   B

Complexity

Conditions 4

Size

Total Lines 25

Duplication

Lines 0
Ratio 0 %

Importance

Changes 2
Bugs 0 Features 1
Metric Value
c 2
b 0
f 1
dl 0
loc 25
rs 8.5806
cc 4
1
#! /usr/bin/env python
0 ignored issues
show
Bug introduced by
There seems to be a cyclic import (skchem.core.atom -> skchem.core.base).

Cyclic imports may cause partly loaded modules to be returned. This might lead to unexpected runtime behavior which is hard to debug.

Loading history...
Bug introduced by
There seems to be a cyclic import (skchem.core -> skchem.core.mol).

Cyclic imports may cause partly loaded modules to be returned. This might lead to unexpected runtime behavior which is hard to debug.

Loading history...
Bug introduced by
There seems to be a cyclic import (skchem -> skchem.descriptors -> skchem.descriptors.physicochemical -> skchem.descriptors.fingerprints).

Cyclic imports may cause partly loaded modules to be returned. This might lead to unexpected runtime behavior which is hard to debug.

Loading history...
Bug introduced by
There seems to be a cyclic import (skchem -> skchem.descriptors -> skchem.descriptors.fingerprints).

Cyclic imports may cause partly loaded modules to be returned. This might lead to unexpected runtime behavior which is hard to debug.

Loading history...
Bug introduced by
There seems to be a cyclic import (skchem.core -> skchem.core.bond).

Cyclic imports may cause partly loaded modules to be returned. This might lead to unexpected runtime behavior which is hard to debug.

Loading history...
Bug introduced by
There seems to be a cyclic import (skchem -> skchem.cross_validation -> skchem.cross_validation.similarity_threshold -> skchem.descriptors -> skchem.descriptors.physicochemical -> skchem.descriptors.fingerprints).

Cyclic imports may cause partly loaded modules to be returned. This might lead to unexpected runtime behavior which is hard to debug.

Loading history...
Bug introduced by
There seems to be a cyclic import (skchem -> skchem.target_prediction -> skchem.target_prediction.PIDGIN -> skchem.descriptors -> skchem.descriptors.physicochemical -> skchem.descriptors.fingerprints).

Cyclic imports may cause partly loaded modules to be returned. This might lead to unexpected runtime behavior which is hard to debug.

Loading history...
2
#
3
# Copyright (C) 2016 Rich Lewis <[email protected]>
4
# License: 3-clause BSD
5
6
"""
7
## skchem.standardizers.chemaxon
8
9
Module wrapping ChemAxon Standardizer.  Must have standardizer installed and
10
license activated.
11
"""
12
13
import os
14
import re
15
from tempfile import NamedTemporaryFile
16
import subprocess
17
import logging
18
19
logger = logging.getLogger(__name__)
0 ignored issues
show
Coding Style Naming introduced by
The name logger does not conform to the constant naming conventions ((([A-Z_][A-Z0-9_]*)|(__.*__))$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
20
21
import numpy as np
0 ignored issues
show
Configuration introduced by
The import numpy could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
Unused Code introduced by
Unused numpy imported as np
Loading history...
22
import pandas as pd
0 ignored issues
show
Configuration introduced by
The import pandas could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
23
24
from .. import core
25
from .. import io
26
27
# ideally we will programatically build this file, but for now just use it.
28
DEFAULT_CONFIG = os.path.join(os.path.dirname(__file__), 'default_config.xml')
29
30
class ChemAxonStandardizer(object):
31
32
    """ Object wrapping the ChemAxon Standardizer, for standardizing molecules.
33
34
    Args:
35
        config_path (str):
36
            The path of the config_file. If None, use the default one.
37
    """
38
    def __init__(self, config_path=None, warn_on_fail=True, error_on_fail=False,
39
                    keep_failed=False):
0 ignored issues
show
Coding Style introduced by
Wrong continued indentation.
keep_failed=False):
| ^
Loading history...
40
41
        if not config_path:
42
            config_path = DEFAULT_CONFIG
43
        self.config_path = config_path
44
        self.keep_failed = keep_failed
45
        self.error_on_fail = error_on_fail
46
        self.warn_on_fail = warn_on_fail
47
48
    def transform(self, obj):
49
50
        """ Standardize compounds.
51
52
        Args:
53
            obj (str, skchem.Mol, pd.Series or pd.DataFrame):
54
                The object to standardize as either smiles as a string, Mol, or
55
                a series or dataframe of these. The object to standardize.
56
57
        Returns:
58
            skchem.Mol or pd.Series or pd.DataFrame:
59
                The standardized molecule, or molecules as a series or
60
                dataframe.
61
        """
62
63
        if isinstance(obj, core.Mol):
64
            return self._transform_mol(obj)
65
        elif isinstance(obj, pd.Series):
66
            return self._transform_ser(obj)
67
        elif isinstance(obj, pd.DataFrame):
68
            res = self._transform_ser(obj.structure)
69
            return res.to_frame(name='structure').join(obj.drop('structure', axis=1))
70
71
        else:
72
            raise NotImplementedError
73
74
    def _transform_mol(self, mol):
75
        mol = pd.DataFrame([mol], index=[mol.name], columns=['structure'])
76
        return self.transform(mol).structure.iloc[0]
77
78
    def _transform_mols(self, X, by='sdf'):
0 ignored issues
show
Coding Style Naming introduced by
The name X does not conform to the argument naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
Coding Style Naming introduced by
The name by does not conform to the argument naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
79
80
        with NamedTemporaryFile() as f_in, NamedTemporaryFile() as f_out:
81
            getattr(io, 'write_' + by)(X, f_in.name)
82
            errs = self._transform_file(f_in.name, f_out.name)
83
            out = io.read_sdf(f_out.name).structure
84
        return out, errs
85
86
    def _transform_smis(self, X):
0 ignored issues
show
Coding Style Naming introduced by
The name X does not conform to the argument naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
87
        with NamedTemporaryFile() as f_in, NamedTemporaryFile() as f_out:
88
            X.to_csv(f_in.name, header=None, index=None)
89
            errs = self._transform_file(f_in.name, f_out.name)
90
            out = io.read_sdf(f_out.name).structure
91
        return out, errs
92
93
    def _transform_file(self, f_in, f_out):
94
        args = ['standardize', f_in,
95
                         '-c', self.config_path,
0 ignored issues
show
Coding Style introduced by
Wrong continued indentation.
'-c', self.config_path,
| ^
Loading history...
96
                         '-f', 'sdf',
0 ignored issues
show
Coding Style introduced by
Wrong continued indentation.
'-f', 'sdf',
| ^
Loading history...
97
                         '-o', f_out,
0 ignored issues
show
Coding Style introduced by
Wrong continued indentation.
'-o', f_out,
| ^
Loading history...
98
                         '--ignore-error']
0 ignored issues
show
Coding Style introduced by
Wrong continued indentation.
'--ignore-error']
| ^
Loading history...
99
        logger.debug('Running %s', ' '.join(args))
100
        sub = subprocess.Popen(args, stderr=subprocess.PIPE)
101
        errs = sub.stderr.read().decode('ascii')
102
        if len(errs):
103
            logger.debug('stderr from Standardizer: \n%s', errs)
104
            errs = errs.strip().split('\n')
105
            errs = [re.findall('No. ([0-9]+):', err) for err in errs]
106
            errs = [int(err[0]) - 1 for err in errs if len(err)]
107
        return errs
108
109
    def _transform_ser(self, X, y=None):
0 ignored issues
show
Coding Style Naming introduced by
The name X does not conform to the argument naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
Coding Style Naming introduced by
The name y does not conform to the argument naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
Unused Code introduced by
The argument y seems to be unused.
Loading history...
110
111
        # TODO: try using different serializations
0 ignored issues
show
Coding Style introduced by
TODO and FIXME comments should generally be avoided.
Loading history...
112
        if isinstance(X.iloc[0], core.Mol):
113
            out, errs = self._transform_mols(X)
114
        elif isinstance(X.iloc[0], str):
115
            out, errs = self._transform_smis(X)
116
        if errs:
117
            out.index = X.index.delete(errs)
118
            for err in errs:
119
                err = X.index[err]
120
                if self.error_on_fail:
121
                    raise ValueError('{} failed to standardize'.format(err))
122
                if self.warn_on_fail:
123
                    logger.warn('%s failed to standardize', err)
124
        else:
125
            out.index = X.index
126
        if self.keep_failed:
127
            out_c = X.copy()
128
            out_c.loc[out.index] = out
129
            return out_c
130
        else:
131
            return out
132