Completed
Push — master ( 6e14ce...512eb0 )
by Rich
02:12
created

RDKFeaturizer.__init__()   A

Complexity

Conditions 1

Size

Total Lines 47

Duplication

Lines 0
Ratio 0 %

Importance

Changes 0
Metric Value
c 0
b 0
f 0
dl 0
loc 47
rs 9.0303
cc 1
1
#! /usr/bin/env python
2
#
3
# Copyright (C) 2007-2009 Rich Lewis <[email protected]>
4
# License: 3-clause BSD
5
6
"""
7
## skchem.descriptors.fingerprints
8
9
Fingerprinting classes and associated functions are defined.
10
"""
11
12
import pandas as pd
0 ignored issues
show
Configuration introduced by
The import pandas could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
13
from rdkit.Chem import GetDistanceMatrix
0 ignored issues
show
Configuration introduced by
The import rdkit.Chem could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
14
from rdkit.DataStructs import ConvertToNumpyArray
0 ignored issues
show
Configuration introduced by
The import rdkit.DataStructs could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
15
from rdkit.Chem.rdMolDescriptors import (GetMorganFingerprint,
0 ignored issues
show
Configuration introduced by
The import rdkit.Chem.rdMolDescriptors could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
16
                                         GetHashedMorganFingerprint,
17
                                         GetMorganFingerprintAsBitVect,
18
                                         GetAtomPairFingerprint,
19
                                         GetHashedAtomPairFingerprint,
20
                                         GetHashedAtomPairFingerprintAsBitVect,
21
                                         GetTopologicalTorsionFingerprint,
22
                                         GetHashedTopologicalTorsionFingerprint,
23
                                         GetHashedTopologicalTorsionFingerprintAsBitVect,
24
                                         GetMACCSKeysFingerprint,
25
                                         GetFeatureInvariants,
26
                                         GetConnectivityInvariants)
27
from rdkit.Chem.rdReducedGraphs import GetErGFingerprint
0 ignored issues
show
Configuration introduced by
The import rdkit.Chem.rdReducedGraphs could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
28
from rdkit.Chem.rdmolops import RDKFingerprint
0 ignored issues
show
Configuration introduced by
The import rdkit.Chem.rdmolops could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
29
30
import numpy as np
0 ignored issues
show
Configuration introduced by
The import numpy could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
31
from ..base import Transformer, Featurizer
32
33
34
class MorganFeaturizer(Transformer, Featurizer):
0 ignored issues
show
best-practice introduced by
Too many instance attributes (8/7)
Loading history...
35
    """ Morgan fingerprints, implemented by RDKit.
36
37
     Notes:
38
39
         Currently, folded bits are by far the fastest implementation.
40
41
     Examples:
42
43
        >>> import skchem
44
        >>> import pandas as pd
45
        >>> pd.options.display.max_rows = pd.options.display.max_columns = 5
46
47
        >>> mf = skchem.descriptors.MorganFeaturizer()
48
        >>> m = skchem.Mol.from_smiles('CCC')
49
50
        Can transform an individual molecule to yield a Series:
51
52
        >>> mf.transform(m)
53
        morgan_fp_idx
54
        0       0
55
        1       0
56
               ..
57
        2046    0
58
        2047    0
59
        Name: MorganFeaturizer, dtype: uint8
60
61
        Can transform a list of molecules to yield a DataFrame:
62
63
        >>> mf.transform([m])
64
        morgan_fp_idx  0     1     ...   2046  2047
65
        0                 0     0  ...      0     0
66
        <BLANKLINE>
67
        [1 rows x 2048 columns]
68
69
        Change the number of features the fingerprint is folded down to using `n_feats`.
70
71
        >>> mf.n_feats = 1024
72
        >>> mf.transform(m)
73
        morgan_fp_idx
74
        0       0
75
        1       0
76
               ..
77
        1022    0
78
        1023    0
79
        Name: MorganFeaturizer, dtype: uint8
80
81
        Count fingerprints with `as_bits` = False
82
83
        >>> mf.as_bits = False
84
        >>> res = mf.transform(m); res[res > 0]
85
        morgan_fp_idx
86
        33     2
87
        80     1
88
        294    2
89
        320    1
90
        Name: MorganFeaturizer, dtype: int64
91
92
        Pseudo-gradient with `grad` shows which atoms contributed to which feature.
93
94
        >>> mf.grad(m)[res > 0]
95
        atom_idx  0  1  2
96
        features
97
        33        1  0  1
98
        80        0  1  0
99
        294       1  2  1
100
        320       1  1  1
101
102
    """
103
    def __init__(self, radius=2, n_feats=2048, as_bits=True, use_features=False,
0 ignored issues
show
best-practice introduced by
Too many arguments (7/5)
Loading history...
104
                 use_bond_types=True, use_chirality=False, **kwargs):
105
106
        """ Initialize the fingerprinter object.
107
108
        Args:
109
             radius (int):
110
                 The maximum radius for atom environments.
111
                 Default is `2`.
112
             n_feats (int):
113
                 The number of features to which to fold the fingerprint down.
114
                 For unfolded, use `-1`.
115
                 Default is `2048`.
116
             as_bits (bool):
117
                 Whether to return bits (`True`) or counts (`False`).
118
                 Default is `True`.
119
             use_features (bool):
120
                 Whether to use map atom types to generic features (FCFP analog).
121
                 Default is `False`.
122
             use_bond_types (bool):
123
                 Whether to use bond types to differentiate environments.
124
                 Default is `False`.
125
             use_chirality (bool):
126
                 Whether to use chirality to differentiate environments.
127
                 Default is `False`.
128
        """
129
130
        super(MorganFeaturizer, self).__init__(**kwargs)
131
        self.radius = radius
132
        self.n_feats = n_feats
133
        self.sparse = self.n_feats < 0
134
        self.as_bits = as_bits
135
        self.use_features = use_features
136
        self.use_bond_types = use_bond_types
137
        self.use_chirality = use_chirality
138
139
    def _transform_mol(self, mol):
140
141
        """Private method to transform a skchem molecule.
142
143
        Use `transform` for the public method, which genericizes the argument to
144
        iterables of mols.
145
146
        Args:
147
            mol (skchem.Mol): Molecule to calculate fingerprint for.
148
149
        Returns:
150
            np.array or dict:
151
                Fingerprint as an array (or a dict if sparse).
152
        """
153
154
        if self.as_bits and self.n_feats > 0:
155
156
            fp = GetMorganFingerprintAsBitVect(mol, self.radius,
0 ignored issues
show
Coding Style Naming introduced by
The name fp does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
157
                                               nBits=self.n_feats,
158
                                               useFeatures=self.use_features,
159
                                               useBondTypes=self.use_bond_types,
160
                                               useChirality=self.use_chirality)
161
            res = np.array(0)
162
            ConvertToNumpyArray(fp, res)
163
            res = res.astype(np.uint8)
164
165
        else:
166
167
            if self.n_feats <= 0:
168
169
                res = GetMorganFingerprint(mol, self.radius,
170
                                           useFeatures=self.use_features,
171
                                           useBondTypes=self.use_bond_types,
172
                                           useChirality=self.use_chirality)
173
                res = res.GetNonzeroElements()
174
                if self.as_bits:
175
                    res = {k: int(v > 0) for k, v in res.items()}
176
177
            else:
178
                res = GetHashedMorganFingerprint(mol, self.radius,
179
                                                 nBits=self.n_feats,
180
                                                 useFeatures=self.use_features,
181
                                                 useBondTypes=self.use_bond_types,
182
                                                 useChirality=self.use_chirality)
183
                res = np.array(list(res))
184
185
        return res
186
187
    @property
188
    def name(self):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
Coding Style introduced by
This method could be written as a function/class method.

If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example

class Foo:
    def some_method(self, x, y):
        return x + y;

could be written as

class Foo:
    @classmethod
    def some_method(cls, x, y):
        return x + y;
Loading history...
189
        return 'morg'
190
191
    @property
192
    def columns(self):
193
        return pd.RangeIndex(self.n_feats, name='morgan_fp_idx')
194
195
    def grad(self, mol):
196
197
        """ Calculate the pseudo gradient with respect to the atoms.
198
199
        The pseudo gradient is the number of times the atom set that particular
200
        bit.
201
202
        Args:
203
            mol (skchem.Mol):
204
                The molecule for which to calculate the pseudo gradient.
205
206
        Returns:
207
            pandas.DataFrame:
208
                Dataframe of pseudogradients, with columns corresponding to
209
                atoms, and rows corresponding to features of the fingerprint.
210
        """
211
212
        cols = pd.Index(list(range(len(mol.atoms))), name='atom_idx')
213
        dist = GetDistanceMatrix(mol)
214
215
        info = {}
216
217
        if self.n_feats < 0:
218
219
            res = GetMorganFingerprint(mol, self.radius,
220
                                       useFeatures=self.use_features,
221
                                       useBondTypes=self.use_bond_types,
222
                                       useChirality=self.use_chirality,
223
                                       bitInfo=info).GetNonzeroElements()
224
            idx_list = list(res.keys())
225
            idx = pd.Index(idx_list, name='features')
226
            grad = np.zeros((len(idx), len(cols)))
227
            for bit in info:
228
                for atom_idx, radius in info[bit]:
229
                    grad[idx_list.index(bit)] += (dist <= radius)[atom_idx]
230
231
        else:
232
233
            res = list(GetHashedMorganFingerprint(mol, self.radius,
234
                                                  nBits=self.n_feats,
235
                                                  useFeatures=self.use_features,
236
                                                  useBondTypes=self.use_bond_types,
237
                                                  useChirality=self.use_chirality,
238
                                                  bitInfo=info))
239
            idx = pd.Index(range(self.n_feats), name='features')
240
            grad = np.zeros((len(idx), len(cols)))
241
242
            for bit in info:
243
                for atom_idx, radius in info[bit]:
244
                    grad[bit] += (dist <= radius)[atom_idx]
245
246
        grad = pd.DataFrame(grad, index=idx, columns=cols)
247
248
        if self.as_bits:
249
            grad = (grad > 0)
250
251
        return grad.astype(int)
252
253
254
class AtomPairFeaturizer(Transformer, Featurizer):
255
256
    """ Atom Pair Fingerprints, implemented by RDKit. """
257
258
    def __init__(self, min_length=1, max_length=30, n_feats=2048, as_bits=False,
0 ignored issues
show
best-practice introduced by
Too many arguments (6/5)
Loading history...
259
                 use_chirality=False, **kwargs):
260
261
        """ Instantiate an atom pair fingerprinter.
262
263
        Args:
264
            min_length (int):
265
                The minimum length of paths between pairs.
266
                Default is `1`, i.e. pairs can be bonded together.
267
            max_length (int):
268
                The maximum length of paths between pairs.
269
                Default is `30`.
270
            n_feats (int):
271
                The number of features to which to fold the fingerprint down.
272
                For unfolded, use `-1`.
273
                Default is `2048`.
274
            as_bits (bool):
275
                Whether to return bits (`True`) or counts (`False`).
276
                Default is `False`.
277
            use_chirality (bool):
278
                Whether to use chirality to differentiate environments.
279
                Default is `False`.
280
        """
281
282
        super(AtomPairFeaturizer, self).__init__(**kwargs)
283
        self.min_length = min_length
284
        self.max_length = max_length
285
        self.n_feats = n_feats
286
        self.sparse = self.n_feats < 0
287
        self.as_bits = as_bits
288
        self.use_chirality = use_chirality
289
290
    def _transform_mol(self, mol):
291
292
        """Private method to transform a skchem molecule.
293
294
        Use transform` for the public method, which genericizes the argument to
295
        iterables of mols.
296
297
        Args:
298
            mol (skchem.Mol): Molecule to calculate fingerprint for.
299
300
        Returns:
301
            np.array or dict:
302
                Fingerprint as an array (or a dict if sparse).
303
        """
304
305
306
        if self.as_bits and self.n_feats > 0:
307
308
            fp = GetHashedAtomPairFingerprintAsBitVect(mol, nBits=self.n_feats,
0 ignored issues
show
Coding Style Naming introduced by
The name fp does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
309
                                           minLength=self.min_length,
0 ignored issues
show
Coding Style introduced by
Wrong continued indentation.
minLength=self.min_length,
^ |
Loading history...
310
                                           maxLength=self.max_length,
0 ignored issues
show
Coding Style introduced by
Wrong continued indentation.
maxLength=self.max_length,
^ |
Loading history...
311
                                           includeChirality=self.use_chirality)
0 ignored issues
show
Coding Style introduced by
Wrong continued indentation.
includeChirality=self.use_chirality)
^ |
Loading history...
312
            res = np.array(0)
313
            ConvertToNumpyArray(fp, res)
314
            res = res.astype(np.uint8)
315
316
        else:
317
318
            if self.n_feats <= 0:
319
320
                res = GetAtomPairFingerprint(mol, nBits=self.n_feats,
321
                                               minLength=self.min_length,
0 ignored issues
show
Coding Style introduced by
Wrong continued indentation.
minLength=self.min_length,
| ^
Loading history...
322
                                               maxLength=self.max_length,
0 ignored issues
show
Coding Style introduced by
Wrong continued indentation.
maxLength=self.max_length,
| ^
Loading history...
323
                                               includeChirality=self.use_chirality)
0 ignored issues
show
Coding Style introduced by
Wrong continued indentation.
includeChirality=self.use_chirality)
| ^
Loading history...
324
                res = res.GetNonzeroElements()
325
                if self.as_bits:
326
                    res = {k: int(v > 0) for k, v in res.items()}
327
328
            else:
329
                res = GetHashedAtomPairFingerprint(mol, nBits=self.n_feats,
330
                                               minLength=self.min_length,
0 ignored issues
show
Coding Style introduced by
Wrong continued indentation.
minLength=self.min_length,
^ |
Loading history...
331
                                               maxLength=self.max_length,
0 ignored issues
show
Coding Style introduced by
Wrong continued indentation.
maxLength=self.max_length,
^ |
Loading history...
332
                                               includeChirality=self.use_chirality)
0 ignored issues
show
Coding Style introduced by
Wrong continued indentation.
includeChirality=self.use_chirality)
^ |
Loading history...
333
                res = np.array(list(res))
334
335
        return res
336
337
    @property
338
    def name(self):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
Coding Style introduced by
This method could be written as a function/class method.

If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example

class Foo:
    def some_method(self, x, y):
        return x + y;

could be written as

class Foo:
    @classmethod
    def some_method(cls, x, y):
        return x + y;
Loading history...
339
        return 'atom_pair'
340
341
    @property
342
    def columns(self):
343
        return pd.RangeIndex(self.n_feats, name='ap_fp_idx')
344
345
346
class TopologicalTorsionFeaturizer(Transformer, Featurizer):
347
348
    """ Topological Torsion fingerprints, implemented by RDKit. """
349
350
    def __init__(self, target_size=4, n_feats=2048, as_bits=False,
351
                 use_chirality=False, **kwargs):
352
353
        """
354
        Args:
355
            target_size (int):
356
                # TODO
357
            n_feats (int):
358
                The number of features to which to fold the fingerprint down.
359
                For unfolded, use `-1`.
360
                Default is `2048`.
361
            as_bits (bool):
362
                Whether to return bits (`True`) or counts (`False`).
363
                Default is `False`.
364
            use_chirality (bool):
365
                Whether to use chirality to differentiate environments.
366
                Default is `False`.
367
        """
368
369
        self.target_size = target_size
370
        self.n_feats = n_feats
371
        self.sparse = self.n_feats < 0
372
        self.as_bits = as_bits
373
        self.use_chirality = use_chirality
374
        super(TopologicalTorsionFeaturizer, self).__init__(**kwargs)
375
376
    def _transform_mol(self, mol):
377
        """ Private method to transform a skchem molecule.
378
        Args:
379
            mol (skchem.Mol): Molecule to calculate fingerprint for.
380
381
        Returns:
382
            np.array or dict:
383 View Code Duplication
                Fingerprint as an array (or a dict if sparse).
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
384
        """
385
386
        if self.as_bits and self.n_feats > 0:
387
388
            fp = GetHashedTopologicalTorsionFingerprintAsBitVect(mol, nBits=self.n_feats,
0 ignored issues
show
Coding Style Naming introduced by
The name fp does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
389
                                           targetSize=self.target_size,
0 ignored issues
show
Coding Style introduced by
Wrong continued indentation.
targetSize=self.target_size,
^ |
Loading history...
390
                                           includeChirality=self.use_chirality)
0 ignored issues
show
Coding Style introduced by
Wrong continued indentation.
includeChirality=self.use_chirality)
^ |
Loading history...
391
            res = np.array(0)
392
            ConvertToNumpyArray(fp, res)
393
            res = res.astype(np.uint8)
394
395
        else:
396
397
            if self.n_feats <= 0:
398
399
                res = GetTopologicalTorsionFingerprint(mol, nBits=self.n_feats,
400
                                               targetSize=self.target_size,
0 ignored issues
show
Coding Style introduced by
Wrong continued indentation.
targetSize=self.target_size,
^ |
Loading history...
401
                                               includeChirality=self.use_chirality)
0 ignored issues
show
Coding Style introduced by
Wrong continued indentation.
includeChirality=self.use_chirality)
^ |
Loading history...
402
                res = res.GetNonzeroElements()
403
                if self.as_bits:
404
                    res = {k: int(v > 0) for k, v in res.items()}
405
406
            else:
407
                res = GetHashedTopologicalTorsionFingerprint(mol, nBits=self.n_feats,
408
                                               targetSize=self.target_size,
0 ignored issues
show
Coding Style introduced by
Wrong continued indentation.
targetSize=self.target_size,
^ |
Loading history...
409
                                               includeChirality=self.use_chirality)
0 ignored issues
show
Coding Style introduced by
Wrong continued indentation.
includeChirality=self.use_chirality)
^ |
Loading history...
410
                res = np.array(list(res))
411
412
        return res
413
414
    @property
415
    def names(self):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
Coding Style introduced by
This method could be written as a function/class method.

If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example

class Foo:
    def some_method(self, x, y):
        return x + y;

could be written as

class Foo:
    @classmethod
    def some_method(cls, x, y):
        return x + y;
Loading history...
416
        return 'top_tort'
417
    @property
418
    def columns(self):
419
        return pd.RangeIndex(self.n_feats, name='tt_fp_idx')
420
421
422
class MACCSFeaturizer(Transformer, Featurizer):
423
424
    """ MACCS Keys Fingerprints """
425
426
    def __init__(self, **kwargs):
427
        super(MACCSFeaturizer, self).__init__(**kwargs)
428
        self.n_feats = 166
429
430
    def _transform_mol(self, mol):
431
        return np.array(list(GetMACCSKeysFingerprint(mol)))[1:]
432
433
    @property
434
    def name(self):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
Coding Style introduced by
This method could be written as a function/class method.

If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example

class Foo:
    def some_method(self, x, y):
        return x + y;

could be written as

class Foo:
    @classmethod
    def some_method(cls, x, y):
        return x + y;
Loading history...
435
        return 'maccs'
436
437
    @property
438
    def columns(self):
439
        return pd.Index(
440
            ['ISOTOPE', '103 < ATOMIC NO. < 256', 'GROUP IVA,VA,VIA PERIODS 4-6 (Ge...)', 'ACTINIDE',
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (101/100).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
441
             'GROUP IIIB,IVB (Sc...)', 'LANTHANIDE', 'GROUP VB,VIB,VIIB (V...)', 'QAAA@1', 'GROUP VIII (Fe...)',
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (112/100).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
442
             'GROUP IIA (ALKALINE EARTH)', '4M RING', 'GROUP IB,IIB (Cu...)', 'ON(C)C', 'S-S', 'OC(O)O', 'QAA@1', 'CTC',
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (120/100).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
443
             'GROUP IIIA (B...)', '7M RING', 'SI', 'C=C(Q)Q', '3M RING', 'NC(O)O', 'N-O', 'NC(N)N', 'C$=C($A)$A', 'I',
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (118/100).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
444
             'QCH2Q', 'P', 'CQ(C)(C)A', 'QX', 'CSN', 'NS', 'CH2=A', 'GROUP IA (ALKALI METAL)', 'S HETEROCYCLE',
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (111/100).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
445
             'NC(O)N', 'NC(C)N', 'OS(O)O', 'S-O', 'CTN', 'F', 'QHAQH', 'OTHER', 'C=CN', 'BR', 'SAN', 'OQ(O)O', 'CHARGE',
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (120/100).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
446
             'C=C(C)C', 'CSO', 'NN', 'QHAAAQH', 'QHAAQH', 'OSO', 'ON(O)C', 'O HETEROCYCLE', 'QSQ', 'Snot%A%A', 'S=O',
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (117/100).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
447
             'AS(A)A', 'A$A!A$A', 'N=O', 'A$A!S', 'C%N', 'CC(C)(C)A', 'QS', 'QHQH (&...)', 'QQH', 'QNQ', 'NO', 'OAAO',
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (118/100).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
448
             'S=A', 'CH3ACH3', 'A!N$A', 'C=C(A)A', 'NAN', 'C=N', 'NAAN', 'NAAAN', 'SA(A)A', 'ACH2QH', 'QAAAA@1', 'NH2',
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (119/100).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
449
             'CN(C)C', 'CH2QCH2', 'X!A$A', 'S', 'OAAAO', 'QHAACH2A', 'QHAAACH2A', 'OC(N)C', 'QCH3', 'QN', 'NAAO',
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (113/100).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
450
             '5M RING', 'NAAAO', 'QAAAAA@1', 'C=C', 'ACH2N', '8M RING', 'QO', 'CL', 'QHACH2A', 'A$A($A)$A', 'QA(Q)Q',
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (117/100).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
451
             'XA(A)A', 'CH3AAACH2A', 'ACH2O', 'NCO', 'NACH2A', 'AA(A)(A)A', 'Onot%A%A', 'CH3CH2A', 'CH3ACH2A',
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (110/100).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
452
             'CH3AACH2A', 'NAO', 'ACH2CH2A > 1', 'N=A', 'HETEROCYCLIC ATOM > 1 (&...)', 'N HETEROCYCLE', 'AN(A)A',
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (114/100).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
453
             'OCO', 'QQ', 'AROMATIC RING > 1', 'A!O!A', 'A$A!O > 1 (&...)', 'ACH2AAACH2A', 'ACH2AACH2A',
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (104/100).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
454
             'QQ > 1 (&...)', 'QH > 1', 'OACH2A', 'A$A!N', 'X (HALOGEN)', 'Nnot%A%A', 'O=A > 1', 'HETEROCYCLE',
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (111/100).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
455
             'QCH2A > 1 (&...)', 'OH', 'O > 3 (&...)', 'CH3 > 2 (&...)', 'N > 1', 'A$A!O', 'Anot%A%Anot%A',
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (107/100).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
456
             '6M RING > 1', 'O > 2', 'ACH2CH2A', 'AQ(A)A', 'CH3 > 1', 'A!A$A!A', 'NH', 'OC(C)C', 'QCH2A', 'C=O',
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (112/100).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
457
             'A!CH2!A', 'NA(A)A', 'C-O', 'C-N', 'O > 1', 'CH3', 'N', 'AROMATIC', '6M RING', 'O', 'RING', 'FRAGMENTS'],
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (118/100).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
458
            name='maccs_idx')
459 View Code Duplication
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
460
461
class ErGFeaturizer(Transformer, Featurizer):
462
463
    """ Extended Reduced Graph Fingerprints.
464
465
     Implemented in RDKit."""
466
467
    def __init__(self, atom_types=0, fuzz_increment=0.3, min_path=1, max_path=15,  **kwargs):
0 ignored issues
show
Coding Style introduced by
Exactly one space required after comma
def __init__(self, atom_types=0, fuzz_increment=0.3, min_path=1, max_path=15, **kwargs):
^
Loading history...
468
469
        super(ErGFeaturizer, self).__init__(**kwargs)
470
        self.atom_types = atom_types
471
        self.fuzz_increment = fuzz_increment
472
        self.min_path = min_path
473
        self.max_path = max_path
474
        self.n_feats = 315
475
476
    def _transform_mol(self, mol):
477
478
        return np.array(GetErGFingerprint(mol))
479
480
    @property
481
    def name(self):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
Coding Style introduced by
This method could be written as a function/class method.

If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example

class Foo:
    def some_method(self, x, y):
        return x + y;

could be written as

class Foo:
    @classmethod
    def some_method(cls, x, y):
        return x + y;
Loading history...
482
        return 'erg'
483
484
    @property
485
    def columns(self):
486
        return pd.RangeIndex(self.n_feats, name='erg_fp_idx')
487
488
489
class FeatureInvariantsFeaturizer(Transformer, Featurizer):
490
491
    """ Feature invariants fingerprints. """
492
493
    def __init__(self, **kwargs):
494
495
        super(FeatureInvariantsFeaturizer, self).__init__(**kwargs)
496
497
    def _transform_mol(self, mol):
498
499
        return np.array(GetFeatureInvariants(mol))
500
501
    @property
502
    def name(self):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
Coding Style introduced by
This method could be written as a function/class method.

If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example

class Foo:
    def some_method(self, x, y):
        return x + y;

could be written as

class Foo:
    @classmethod
    def some_method(cls, x, y):
        return x + y;
Loading history...
503
        return 'feat_inv'
504
505
    @property
506
    def columns(self):
507
        return None
508
509
class ConnectivityInvariantsFeaturizer(Transformer, Featurizer):
510
511
    """ Connectivity invariants fingerprints """
512
513
    def __init__(self, include_ring_membership=True, **kwargs):
514
        super(ConnectivityInvariantsFeaturizer, self).__init__(self, **kwargs)
515
        self.include_ring_membership = include_ring_membership
516
        raise NotImplementedError # this is a sparse descriptor
517
518
    def _transform_mol(self, mol):
519
520
        return np.array(GetConnectivityInvariants(mol))
521
522
    @property
523
    def name(self):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
Coding Style introduced by
This method could be written as a function/class method.

If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example

class Foo:
    def some_method(self, x, y):
        return x + y;

could be written as

class Foo:
    @classmethod
    def some_method(cls, x, y):
        return x + y;
Loading history...
524
        return 'conn_inv'
525
526
    @property
527
    def columns(self):
528
        return None
529
530
class RDKFeaturizer(Transformer, Featurizer):
0 ignored issues
show
best-practice introduced by
Too many instance attributes (9/7)
Loading history...
531
532
    """ RDKit fingerprint """
533
534
    # TODO: finish docstring
0 ignored issues
show
Coding Style introduced by
TODO and FIXME comments should generally be avoided.
Loading history...
535
536
    def __init__(self, min_path=1, max_path=7, n_feats=2048, n_bits_per_hash=2,
0 ignored issues
show
best-practice introduced by
Too many arguments (10/5)
Loading history...
537
                 use_hs=True, target_density=0.0, min_size=128,
538
                 branched_paths=True, use_bond_types=True, **kwargs):
539
540
        """ RDK fingerprints
541
542
        Args:
543
            min_path (int):
544
                minimum number of bonds to include in the subgraphs.
545
546
            max_path (int):
547
                maximum number of bonds to include in the subgraphs.
548
549
            n_feats (int):
550
                The number of features to which to fold the fingerprint down. For unfolded, use `-1`.
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (101/100).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
551
552
            n_bits_per_hash (int)
553
                number of bits to set per path.
554
555
            use_hs (bool):
556
                include paths involving Hs in the fingerprint if the molecule has explicit Hs.
557
558
            target_density (float):
559
                fold the fingerprint until this minimum density has been reached.
560
561
            min_size (int):
562
                the minimum size the fingerprint will be folded to when trying to reach tgtDensity.
563
564
            branched_paths (bool):
565
                if set both branched and unbranched paths will be used in the fingerprint.
566
567
            use_bond_types (bool):
568
                if set both bond orders will be used in the path hashes.
569
570
        """
571
572
        super(RDKFeaturizer, self).__init__(**kwargs)
573
574
        self.min_path = min_path
575
        self.max_path = max_path
576
        self.n_feats = n_feats
577
        self.n_bits_per_hash = n_bits_per_hash
578
        self.use_hs = use_hs
579
        self.target_density = target_density
580
        self.min_size = min_size
581
        self.branched_paths = branched_paths
582
        self.use_bond_types = use_bond_types
583
584
    def _transform_mol(self, mol):
585
586
        return np.array(list(RDKFingerprint(mol, minPath=self.min_path,
587
                                            maxPath=self.max_path,
588
                                            fpSize=self.n_feats,
589
                                            nBitsPerHash=self.n_bits_per_hash,
590
                                            useHs=self.use_hs,
591
                                            tgtDensity=self.target_density,
592
                                            minSize=self.min_size,
593
                                            branchedPaths=self.branched_paths,
594
                                            useBondOrder=self.use_bond_types)))
595
596
    @property
597
    def name(self):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
Coding Style introduced by
This method could be written as a function/class method.

If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example

class Foo:
    def some_method(self, x, y):
        return x + y;

could be written as

class Foo:
    @classmethod
    def some_method(cls, x, y):
        return x + y;
Loading history...
598
        return 'rdkit'
599
600
    @property
601
    def columns(self):
602
        return pd.RangeIndex(self.n_feats, name='rdk_fp_idx')
0 ignored issues
show
Coding Style introduced by
Final newline missing
Loading history...