Completed
Push — master ( 01edc4...2b1c29 )
by Rich
01:29
created

ChemAxonStandardizer.transform()   B

Complexity

Conditions 4

Size

Total Lines 25

Duplication

Lines 0
Ratio 0 %

Importance

Changes 2
Bugs 0 Features 1
Metric Value
c 2
b 0
f 1
dl 0
loc 25
rs 8.5806
cc 4
1
#! /usr/bin/env python
2
#
3
# Copyright (C) 2016 Rich Lewis <[email protected]>
4
# License: 3-clause BSD
5
6
"""
7
## skchem.standardizers.chemaxon
8
9
Module wrapping ChemAxon Standardizer.  Must have standardizer installed and
10
license activated.
11
"""
12
13
import time
14
import os
15
import re
16
from tempfile import NamedTemporaryFile
17
import subprocess
18
import logging
19
20
import pandas as pd
0 ignored issues
show
Configuration introduced by
The import pandas could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
21
22
from .. import core
23
from .. import io
24
from ..utils import NamedProgressBar, sdf_count
25
LOGGER = logging.getLogger(__name__)
26
27
28
class ChemAxonStandardizer(object):
29
30
    """ Object wrapping the ChemAxon Standardizer, for standardizing molecules.
31
32
    Args:
33
        config_path (str):
34
            The path of the config_file. If None, use the default one.
35
36
    Note:
37
        ChemAxon Standardizer must be installed and accessible as `standardize`
38
        from the shell launching the program.
39
40
    Warn:
41
        When using standardizer on smiles, it is currently unsupported if any
42
        of the compounds fail to subsequently parse.
43
    """
44
45
    # ideally we will programatically build this file, but for now just use it.
46
    DEFAULT_CONFIG = os.path.join(os.path.dirname(__file__), 'default_config.xml')
47
48
    def __init__(self, config_path=None, warn_on_fail=True, error_on_fail=False,
49
                    keep_failed=False):
0 ignored issues
show
Coding Style introduced by
Wrong continued indentation.
keep_failed=False):
| ^
Loading history...
50
51
        if not config_path:
52
            config_path = self.DEFAULT_CONFIG
53
        self.config_path = config_path
54
        self.keep_failed = keep_failed
55
        self.error_on_fail = error_on_fail
56
        self.warn_on_fail = warn_on_fail
57
58
    def transform(self, obj):
59
60
        """ Standardize compounds.
61
62
        Args:
63
            obj (str, skchem.Mol, pd.Series or pd.DataFrame):
64
                The object to standardize as either smiles as a string, Mol, or
65
                a series or dataframe of these. The object to standardize.
66
67
        Returns:
68
            skchem.Mol or pd.Series or pd.DataFrame:
69
                The standardized molecule, or molecules as a series or
70
                dataframe.
71
        """
72
73
        if isinstance(obj, core.Mol):
74
            return self._transform_mol(obj)
75
        elif isinstance(obj, pd.Series):
76
            return self._transform_ser(obj)
77
        elif isinstance(obj, pd.DataFrame):
78
            res = self._transform_ser(obj.structure)
79
            return res.to_frame(name='structure').join(obj.drop('structure', axis=1))
80
81
        else:
82
            raise NotImplementedError
83
84
    def _transform_mol(self, mol):
85
        mol = pd.DataFrame([mol], index=[mol.name], columns=['structure'])
86
        return self.transform(mol).structure.iloc[0]
87
88
    def _transform_mols(self, X):
0 ignored issues
show
Coding Style Naming introduced by
The name X does not conform to the argument naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
89
90
        with NamedTemporaryFile(suffix='.sdf') as f_in, NamedTemporaryFile() as f_out:
91
            io.write_sdf(X, f_in.name)
92
            errs = self._transform_file(f_in.name, f_out.name, length=len(X))
93
            out = io.read_sdf(f_out.name).structure
94
        return out, errs
95
96
    def _transform_smis(self, X):
0 ignored issues
show
Coding Style Naming introduced by
The name X does not conform to the argument naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
97
        with NamedTemporaryFile() as f_in, NamedTemporaryFile() as f_out:
98
            X.to_csv(f_in.name, header=None, index=None)
99
            LOGGER.debug('Input file length: %s', len(X))
100
            errs = self._transform_file(f_in.name, f_out.name, length=len(X))
101
            out = io.read_sdf(f_out.name).structure
102
            LOGGER.debug('Output file length: %s', len(out))
103
        return out, errs
104
105
    def _transform_file(self, f_in, f_out, length=None):
106
        args = ['standardize', f_in,
107
                         '-c', self.config_path,
0 ignored issues
show
Coding Style introduced by
Wrong continued indentation.
'-c', self.config_path,
| ^
Loading history...
108
                         '-f', 'sdf',
0 ignored issues
show
Coding Style introduced by
Wrong continued indentation.
'-f', 'sdf',
| ^
Loading history...
109
                         '-o', f_out,
0 ignored issues
show
Coding Style introduced by
Wrong continued indentation.
'-o', f_out,
| ^
Loading history...
110
                         '--ignore-error']
0 ignored issues
show
Coding Style introduced by
Wrong continued indentation.
'--ignore-error']
| ^
Loading history...
111
        LOGGER.debug('Running %s', ' '.join(args))
112
        p = subprocess.Popen(args, stderr=subprocess.PIPE)
0 ignored issues
show
Coding Style Naming introduced by
The name p does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
113
        if length is None:
114
            length = sdf_count(f_in)
115
        bar = NamedProgressBar(name=self.__class__.__name__, max_value=length)
0 ignored issues
show
introduced by
Black listed name "bar"
Loading history...
116
        while p.poll() is None:
117
            bar.update(sdf_count(f_out))
0 ignored issues
show
Bug introduced by
The Instance of NamedProgressBar does not seem to have a member named update.

This check looks for calls to members that are non-existent. These calls will fail.

The member could have been renamed or removed.

Loading history...
118
            time.sleep(1)
119
        bar.update(length)
0 ignored issues
show
Bug introduced by
The Instance of NamedProgressBar does not seem to have a member named update.

This check looks for calls to members that are non-existent. These calls will fail.

The member could have been renamed or removed.

Loading history...
120
        p.wait()
121
122
        errs = p.stderr.read().decode()
123
        if len(errs):
124
            LOGGER.debug('stderr from Standardizer: \n%s', errs)
125
            errs = errs.strip().split('\n')
126
            errs = [re.findall('No. ([0-9]+):', err) for err in errs]
127
            errs = [int(err[0]) - 1 for err in errs if len(err)]
128
        else:
129
            LOGGER.debug('no errors')
130
        return errs
131
132
    def _transform_ser(self, X, y=None):
0 ignored issues
show
Coding Style Naming introduced by
The name X does not conform to the argument naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
Coding Style Naming introduced by
The name y does not conform to the argument naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
Unused Code introduced by
The argument y seems to be unused.
Loading history...
133
134
        # TODO: try using different serializations
0 ignored issues
show
Coding Style introduced by
TODO and FIXME comments should generally be avoided.
Loading history...
135
136
        LOGGER.info('X type %s', type(X))
137
        LOGGER.info('X.iloc[0] type %s', type(X.iloc[0]))
138
        if isinstance(X.iloc[0], core.Mol):
139
            out, errs = self._transform_mols(X)
140
        elif isinstance(X.iloc[0], str):
141
            out, errs = self._transform_smis(X)
142
        else:
143
            raise RuntimeError('Failed.')
144
        if errs:
145
            out.index = X.index.delete(errs)
146
            for err in errs:
147
                err = X.index[err]
148
                if self.error_on_fail:
149
                    raise ValueError('{} failed to standardize'.format(err))
150
                if self.warn_on_fail:
151
                    LOGGER.warn('%s failed to standardize', err)
152
        else:
153
            out.index = X.index
154
        if self.keep_failed:
155
            out_c = X.copy()
156
            out_c.loc[out.index] = out
157
            return out_c
158
        else:
159
            return out
160