MorganFeaturizer.grad()   C
last analyzed

Complexity

Conditions 7

Size

Total Lines 57

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 14
CRAP Score 9.3554

Importance

Changes 2
Bugs 0 Features 0
Metric Value
c 2
b 0
f 0
dl 0
loc 57
ccs 14
cts 22
cp 0.6364
rs 6.6397
cc 7
crap 9.3554

How to fix   Long Method   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
#! /usr/bin/env python
2
#
3
# Copyright (C) 2007-2009 Rich Lewis <[email protected]>
4
# License: 3-clause BSD
5
6 1
"""
7
## skchem.descriptors.fingerprints
8
9
Fingerprinting classes and associated functions are defined.
10
"""
11
12 1
import pandas as pd
0 ignored issues
show
Configuration introduced by
The import pandas could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
13 1
from rdkit.Chem import GetDistanceMatrix
0 ignored issues
show
Configuration introduced by
The import rdkit.Chem could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
14 1
from rdkit.DataStructs import ConvertToNumpyArray
0 ignored issues
show
Configuration introduced by
The import rdkit.DataStructs could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
15 1
from rdkit.Chem.rdMolDescriptors import (
0 ignored issues
show
Configuration introduced by
The import rdkit.Chem.rdMolDescriptors could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
16
    GetMorganFingerprint,
17
    GetHashedMorganFingerprint,
18
    GetMorganFingerprintAsBitVect,
19
    GetAtomPairFingerprint,
20
    GetHashedAtomPairFingerprint,
21
    GetHashedAtomPairFingerprintAsBitVect,
22
    GetTopologicalTorsionFingerprint,
23
    GetHashedTopologicalTorsionFingerprint,
24
    GetHashedTopologicalTorsionFingerprintAsBitVect,
25
    GetMACCSKeysFingerprint,
26
    GetFeatureInvariants,
27
    GetConnectivityInvariants)
28 1
from rdkit.Chem.rdReducedGraphs import GetErGFingerprint
0 ignored issues
show
Configuration introduced by
The import rdkit.Chem.rdReducedGraphs could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
29 1
from rdkit.Chem.rdmolops import RDKFingerprint
0 ignored issues
show
Configuration introduced by
The import rdkit.Chem.rdmolops could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
30
31 1
import numpy as np
0 ignored issues
show
Configuration introduced by
The import numpy could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
32 1
from ..base import Transformer, Featurizer
33
34
35 1
class MorganFeaturizer(Transformer, Featurizer):
0 ignored issues
show
best-practice introduced by
Too many instance attributes (8/7)
Loading history...
36
    """ Morgan fingerprints, implemented by RDKit.
37
38
    Notes:
39
40
        Currently, folded bits are by far the fastest implementation.
41
42
        Due to the speed of calculation, it is unlikely to see a speedup using
43
        the current parallel code, as more time is spent moving data across
44
        processes than for calculating in a single process.
45
46
    Examples:
47
48
    >>> import skchem
49
    >>> import pandas as pd
50
    >>> pd.options.display.max_rows = pd.options.display.max_columns = 5
51
52
    >>> mf = skchem.features.MorganFeaturizer()
53
    >>> m = skchem.Mol.from_smiles('CCC')
54
55
    Can transform an individual molecule to yield a Series:
56
57
    >>> mf.transform(m)
58
    morgan_fp_idx
59
    0       0
60
    1       0
61
    ..
62
    2046    0
63
    2047    0
64
    Name: MorganFeaturizer, dtype: uint8
65
66
    Can transform a list of molecules to yield a DataFrame:
67
68
    >>> mf.transform([m])
69
    morgan_fp_idx  0     1     ...   2046  2047
70
    0                 0     0  ...      0     0
71
    <BLANKLINE>
72
    [1 rows x 2048 columns]
73
74
    Change the number of features the fingerprint is folded down to using
75
    `n_feats`.
76
77
    >>> mf.n_feats = 1024
78
    >>> mf.transform(m)
79
    morgan_fp_idx
80
    0       0
81
    1       0
82
    ..
83
    1022    0
84
    1023    0
85
    Name: MorganFeaturizer, dtype: uint8
86
87
    Count fingerprints with `as_bits` = False
88
89
    >>> mf.as_bits = False
90
    >>> res = mf.transform(m); res[res > 0]
91
    morgan_fp_idx
92
    33     2
93
    80     1
94
    294    2
95
    320    1
96
    Name: MorganFeaturizer, dtype: int64
97
98
    Pseudo-gradient with `grad` shows which atoms contributed to which
99
    feature.
100
101
    >>> mf.grad(m)[res > 0]
102
    atom_idx  0  1  2
103
    features
104
    33        1  0  1
105
    80        0  1  0
106
    294       1  2  1
107
    320       1  1  1
108
109
    """
110 1
    def __init__(self, radius=2, n_feats=2048, as_bits=True,
0 ignored issues
show
best-practice introduced by
Too many arguments (9/5)
Loading history...
111
                 use_features=False, use_bond_types=True, use_chirality=False,
112
                 n_jobs=1, verbose=True):
113
114
        """ Initialize the fingerprinter object.
115
116
        Args:
117
             radius (int):
118
                 The maximum radius for atom environments.
119
                 Default is `2`.
120
121
             n_feats (int):
122
                 The number of features to which to fold the fingerprint down.
123
                 For unfolded, use `-1`.
124
                 Default is `2048`.
125
126
             as_bits (bool):
127
                 Whether to return bits (`True`) or counts (`False`).
128
                 Default is `True`.
129
130
             use_features (bool):
131
                 Whether to use map atom types to generic features (FCFP).
132
                 Default is `False`.
133
134
             use_bond_types (bool):
135
                 Whether to use bond types to differentiate environments.
136
                 Default is `False`.
137
138
             use_chirality (bool):
139
                 Whether to use chirality to differentiate environments.
140
                 Default is `False`.
141
142
             n_jobs (int):
143
                 The number of processes to run the featurizer in.
144
145
             verbose (bool):
146
                 Whether to output a progress bar.
147
148
        """
149
150 1
        super(MorganFeaturizer, self).__init__(n_jobs=n_jobs, verbose=verbose)
151 1
        self.radius = radius
152 1
        self.n_feats = n_feats
153 1
        self.sparse = self.n_feats < 0
154 1
        self.as_bits = as_bits
155 1
        self.use_features = use_features
156 1
        self.use_bond_types = use_bond_types
157 1
        self.use_chirality = use_chirality
158
159 1 View Code Duplication
    def _transform_mol(self, mol):
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
160
161
        """Private method to transform a skchem molecule.
162
163
        Use `transform` for the public method, which genericizes the argument
164
        to iterables of mols.
165
166
        Args:
167
            mol (skchem.Mol): Molecule to calculate fingerprint for.
168
169
        Returns:
170
            np.array or dict:
171
                Fingerprint as an array (or a dict if sparse).
172
        """
173
174 1
        if self.as_bits and self.n_feats > 0:
175
176 1
            fp = GetMorganFingerprintAsBitVect(
0 ignored issues
show
Coding Style Naming introduced by
The name fp does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
177
                mol, self.radius, nBits=self.n_feats,
178
                useFeatures=self.use_features,
179
                useBondTypes=self.use_bond_types,
180
                useChirality=self.use_chirality)
181
182 1
            res = np.array(0)
183 1
            ConvertToNumpyArray(fp, res)
184 1
            res = res.astype(np.uint8)
185
186
        else:
187
188 1
            if self.n_feats <= 0:
189
190
                res = GetMorganFingerprint(
191
                    mol, self.radius,
192
                    useFeatures=self.use_features,
193
                    useBondTypes=self.use_bond_types,
194
                    useChirality=self.use_chirality)
195
196
                res = res.GetNonzeroElements()
197
                if self.as_bits:
198
                    res = {k: int(v > 0) for k, v in res.items()}
199
200
            else:
201 1
                res = GetHashedMorganFingerprint(
202
                    mol, self.radius, nBits=self.n_feats,
203
                    useFeatures=self.use_features,
204
                    useBondTypes=self.use_bond_types,
205
                    useChirality=self.use_chirality)
206
207 1
                res = np.array(list(res))
208
209 1
        return res
210
211 1
    @property
212
    def name(self):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
Coding Style introduced by
This method could be written as a function/class method.

If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example

class Foo:
    def some_method(self, x, y):
        return x + y;

could be written as

class Foo:
    @classmethod
    def some_method(cls, x, y):
        return x + y;
Loading history...
213
        return 'morg'
214
215 1
    @property
216
    def columns(self):
217 1
        return pd.RangeIndex(self.n_feats, name='morgan_fp_idx')
218
219 1
    def grad(self, mol):
220
221
        """ Calculate the pseudo gradient with respect to the atoms.
222
223
        The pseudo gradient is the number of times the atom set that particular
224
        bit.
225
226
        Args:
227
            mol (skchem.Mol):
228
                The molecule for which to calculate the pseudo gradient.
229
230
        Returns:
231
            pandas.DataFrame:
232
                Dataframe of pseudogradients, with columns corresponding to
233
                atoms, and rows corresponding to features of the fingerprint.
234
        """
235
236 1
        cols = pd.Index(list(range(len(mol.atoms))), name='atom_idx')
237 1
        dist = GetDistanceMatrix(mol)
238
239 1
        info = {}
240
241 1
        if self.n_feats < 0:
242
243
            res = GetMorganFingerprint(mol, self.radius,
244
                                       useFeatures=self.use_features,
245
                                       useBondTypes=self.use_bond_types,
246
                                       useChirality=self.use_chirality,
247
                                       bitInfo=info).GetNonzeroElements()
248
            idx_list = list(res.keys())
249
            idx = pd.Index(idx_list, name='features')
250
            grad = np.zeros((len(idx), len(cols)))
251
            for bit in info:
252
                for atom_idx, radius in info[bit]:
253
                    grad[idx_list.index(bit)] += (dist <= radius)[atom_idx]
254
255
        else:
256
257 1
            GetHashedMorganFingerprint(mol, self.radius, nBits=self.n_feats,
258
                                       useFeatures=self.use_features,
259
                                       useBondTypes=self.use_bond_types,
260
                                       useChirality=self.use_chirality,
261
                                       bitInfo=info)
262
263 1
            idx = pd.Index(range(self.n_feats), name='features')
264 1
            grad = np.zeros((len(idx), len(cols)))
265
266 1
            for bit in info:
267 1
                for atom_idx, radius in info[bit]:
268 1
                    grad[bit] += (dist <= radius)[atom_idx]
269
270 1
        grad = pd.DataFrame(grad, index=idx, columns=cols)
271
272 1
        if self.as_bits:
273
            grad = (grad > 0)
274
275 1
        return grad.astype(int)
276
277
278 1
class AtomPairFeaturizer(Transformer, Featurizer):
279
280
    """ Atom Pair Fingerprints, implemented by RDKit. """
281
282 1
    def __init__(self, min_length=1, max_length=30, n_feats=2048,
0 ignored issues
show
best-practice introduced by
Too many arguments (8/5)
Loading history...
283
                 as_bits=False, use_chirality=False, n_jobs=1, verbose=True):
284
285
        """ Instantiate an atom pair fingerprinter.
286
287
        Args:
288
            min_length (int):
289
                The minimum length of paths between pairs.
290
                Default is `1`, i.e. pairs can be bonded together.
291
292
            max_length (int):
293
                The maximum length of paths between pairs.
294
                Default is `30`.
295
296
            n_feats (int):
297
                The number of features to which to fold the fingerprint down.
298
                For unfolded, use `-1`.
299
                Default is `2048`.
300
301
            as_bits (bool):
302
                Whether to return bits (`True`) or counts (`False`).
303
                Default is `False`.
304
305
            use_chirality (bool):
306
                Whether to use chirality to differentiate environments.
307
                Default is `False`.
308
309
            n_jobs (int):
310
                The number of processes to run the featurizer in.
311
312
            verbose (bool):
313
                Whether to output a progress bar.
314
        """
315
316
        super(AtomPairFeaturizer, self).__init__(n_jobs=n_jobs,
317
                                                 verbose=verbose)
318
        self.min_length = min_length
319
        self.max_length = max_length
320
        self.n_feats = n_feats
321
        self.sparse = self.n_feats < 0
322
        self.as_bits = as_bits
323
        self.use_chirality = use_chirality
324
325 1 View Code Duplication
    def _transform_mol(self, mol):
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
326
327
        """Private method to transform a skchem molecule.
328
329
        Use transform` for the public method, which genericizes the argument to
330
        iterables of mols.
331
332
        Args:
333
            mol (skchem.Mol): Molecule to calculate fingerprint for.
334
335
        Returns:
336
            np.array or dict:
337
                Fingerprint as an array (or a dict if sparse).
338
        """
339
340
        if self.as_bits and self.n_feats > 0:
341
342
            fp = GetHashedAtomPairFingerprintAsBitVect(
0 ignored issues
show
Coding Style Naming introduced by
The name fp does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
343
                mol, nBits=self.n_feats, minLength=self.min_length,
344
                maxLength=self.max_length, includeChirality=self.use_chirality)
345
346
            res = np.array(0)
347
            ConvertToNumpyArray(fp, res)
348
            res = res.astype(np.uint8)
349
350
        else:
351
352
            if self.n_feats <= 0:
353
354
                res = GetAtomPairFingerprint(
355
                    mol, nBits=self.n_feats, minLength=self.min_length,
356
                    maxLength=self.max_length,
357
                    includeChirality=self.use_chirality)
358
359
                res = res.GetNonzeroElements()
360
                if self.as_bits:
361
                    res = {k: int(v > 0) for k, v in res.items()}
362
363
            else:
364
                res = GetHashedAtomPairFingerprint(
365
                    mol, nBits=self.n_feats, minLength=self.min_length,
366
                    maxLength=self.max_length,
367
                    includeChirality=self.use_chirality)
368
369
                res = np.array(list(res))
370
371
        return res
372
373 1
    @property
374
    def name(self):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
Coding Style introduced by
This method could be written as a function/class method.

If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example

class Foo:
    def some_method(self, x, y):
        return x + y;

could be written as

class Foo:
    @classmethod
    def some_method(cls, x, y):
        return x + y;
Loading history...
375
        return 'atom_pair'
376
377 1
    @property
378
    def columns(self):
379
        return pd.RangeIndex(self.n_feats, name='ap_fp_idx')
380
381
382 1
class TopologicalTorsionFeaturizer(Transformer, Featurizer):
383
384
    """ Topological Torsion fingerprints, implemented by RDKit. """
385
386 1
    def __init__(self, target_size=4, n_feats=2048, as_bits=False,
0 ignored issues
show
best-practice introduced by
Too many arguments (7/5)
Loading history...
387
                 use_chirality=False, n_jobs=1, verbose=True):
388
389
        """ Initialize a TopologicalTorsionFeaturizer object.
390
391
        Args:
392
            target_size (int):
393
                # TODO
394
395
            n_feats (int):
396
                The number of features to which to fold the fingerprint down.
397
                For unfolded, use `-1`.
398
                Default is `2048`.
399
400
            as_bits (bool):
401
                Whether to return bits (`True`) or counts (`False`).
402
                Default is `False`.
403
404
            use_chirality (bool):
405
                Whether to use chirality to differentiate environments.
406
                Default is `False`.
407
            n_jobs (int):
408
                The number of processes to run the featurizer in.
409
410
            verbose (bool):
411
                Whether to output a progress bar.
412
        """
413
414
        self.target_size = target_size
415
        self.n_feats = n_feats
416
        self.sparse = self.n_feats < 0
417
        self.as_bits = as_bits
418
        self.use_chirality = use_chirality
419
        super(TopologicalTorsionFeaturizer, self).__init__(n_jobs=n_jobs,
420
                                                           verbose=verbose)
421
422 1 View Code Duplication
    def _transform_mol(self, mol):
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
423
        """ Private method to transform a skchem molecule.
424
        Args:
425
            mol (skchem.Mol): Molecule to calculate fingerprint for.
426
427
        Returns:
428
            np.array or dict:
429
                Fingerprint as an array (or a dict if sparse).
430
        """
431
432
        if self.as_bits and self.n_feats > 0:
433
434
            fp = GetHashedTopologicalTorsionFingerprintAsBitVect(
0 ignored issues
show
Coding Style Naming introduced by
The name fp does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
435
                mol, nBits=self.n_feats, targetSize=self.target_size,
436
                includeChirality=self.use_chirality)
437
438
            res = np.array(0)
439
            ConvertToNumpyArray(fp, res)
440
            res = res.astype(np.uint8)
441
442
        else:
443
444
            if self.n_feats <= 0:
445
446
                res = GetTopologicalTorsionFingerprint(
447
                    mol, nBits=self.n_feats, targetSize=self.target_size,
448
                    includeChirality=self.use_chirality)
449
450
                res = res.GetNonzeroElements()
451
                if self.as_bits:
452
                    res = {k: int(v > 0) for k, v in res.items()}
453
454
            else:
455
                res = GetHashedTopologicalTorsionFingerprint(
456
                    mol, nBits=self.n_feats, targetSize=self.target_size,
457
                    includeChirality=self.use_chirality)
458
459
                res = np.array(list(res))
460
461
        return res
462
463 1
    @property
464
    def names(self):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
Coding Style introduced by
This method could be written as a function/class method.

If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example

class Foo:
    def some_method(self, x, y):
        return x + y;

could be written as

class Foo:
    @classmethod
    def some_method(cls, x, y):
        return x + y;
Loading history...
465
        return 'top_tort'
466
467 1
    @property
468
    def columns(self):
469
        return pd.RangeIndex(self.n_feats, name='tt_fp_idx')
470
471
472 1
class MACCSFeaturizer(Transformer, Featurizer):
473
474
    """ MACCS Keys Fingerprints."""
475
476 1
    def __init__(self, n_jobs=1, verbose=True):
477
478
        """ Initialize a MACCS Featurizer.
479
480
        Args:
481
            n_jobs (int):
482
                The number of processes to run the featurizer in.
483
484
            verbose (bool):
485
                Whether to output a progress bar.
486
        """
487
488
        super(MACCSFeaturizer, self).__init__(n_jobs=n_jobs, verbose=verbose)
489
        self.n_feats = 166
490
491 1
    def _transform_mol(self, mol):
492
        return np.array(list(GetMACCSKeysFingerprint(mol)))[1:]
493
494 1
    @property
495
    def name(self):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
Coding Style introduced by
This method could be written as a function/class method.

If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example

class Foo:
    def some_method(self, x, y):
        return x + y;

could be written as

class Foo:
    @classmethod
    def some_method(cls, x, y):
        return x + y;
Loading history...
496
        return 'maccs'
497
498 1
    @property
499
    def columns(self):
500
        return pd.Index(
501
            ['ISOTOPE', '103 < ATOMIC NO. < 256',
502
             'GROUP IVA,VA,VIA PERIODS 4-6 (Ge...)', 'ACTINIDE',
503
             'GROUP IIIB,IVB (Sc...)', 'LANTHANIDE',
504
             'GROUP VB,VIB,VIIB (V...)', 'QAAA@1', 'GROUP VIII (Fe...)',
505
             'GROUP IIA (ALKALINE EARTH)', '4M RING', 'GROUP IB,IIB (Cu...)',
506
             'ON(C)C', 'S-S', 'OC(O)O', 'QAA@1', 'CTC',
507
             'GROUP IIIA (B...)', '7M RING', 'SI', 'C=C(Q)Q', '3M RING',
508
             'NC(O)O', 'N-O', 'NC(N)N', 'C$=C($A)$A', 'I',
509
             'QCH2Q', 'P', 'CQ(C)(C)A', 'QX', 'CSN', 'NS', 'CH2=A',
510
             'GROUP IA (ALKALI METAL)', 'S HETEROCYCLE',
511
             'NC(O)N', 'NC(C)N', 'OS(O)O', 'S-O', 'CTN', 'F', 'QHAQH', 'OTHER',
512
             'C=CN', 'BR', 'SAN', 'OQ(O)O', 'CHARGE',
513
             'C=C(C)C', 'CSO', 'NN', 'QHAAAQH', 'QHAAQH', 'OSO', 'ON(O)C',
514
             'O HETEROCYCLE', 'QSQ', 'Snot%A%A', 'S=O',
515
             'AS(A)A', 'A$A!A$A', 'N=O', 'A$A!S', 'C%N', 'CC(C)(C)A', 'QS',
516
             'QHQH (&...)', 'QQH', 'QNQ', 'NO', 'OAAO',
517
             'S=A', 'CH3ACH3', 'A!N$A', 'C=C(A)A', 'NAN', 'C=N', 'NAAN',
518
             'NAAAN', 'SA(A)A', 'ACH2QH', 'QAAAA@1', 'NH2',
519
             'CN(C)C', 'CH2QCH2', 'X!A$A', 'S', 'OAAAO', 'QHAACH2A',
520
             'QHAAACH2A', 'OC(N)C', 'QCH3', 'QN', 'NAAO',
521
             '5M RING', 'NAAAO', 'QAAAAA@1', 'C=C', 'ACH2N', '8M RING', 'QO',
522
             'CL', 'QHACH2A', 'A$A($A)$A', 'QA(Q)Q',
523
             'XA(A)A', 'CH3AAACH2A', 'ACH2O', 'NCO', 'NACH2A', 'AA(A)(A)A',
524
             'Onot%A%A', 'CH3CH2A', 'CH3ACH2A',
525
             'CH3AACH2A', 'NAO', 'ACH2CH2A > 1', 'N=A',
526
             'HETEROCYCLIC ATOM > 1 (&...)', 'N HETEROCYCLE', 'AN(A)A',
527
             'OCO', 'QQ', 'AROMATIC RING > 1', 'A!O!A', 'A$A!O > 1 (&...)',
528
             'ACH2AAACH2A', 'ACH2AACH2A',
529
             'QQ > 1 (&...)', 'QH > 1', 'OACH2A', 'A$A!N', 'X (HALOGEN)',
530
             'Nnot%A%A', 'O=A > 1', 'HETEROCYCLE',
531
             'QCH2A > 1 (&...)', 'OH', 'O > 3 (&...)', 'CH3 > 2 (&...)',
532
             'N > 1', 'A$A!O', 'Anot%A%Anot%A',
533
             '6M RING > 1', 'O > 2', 'ACH2CH2A', 'AQ(A)A', 'CH3 > 1',
534
             'A!A$A!A', 'NH', 'OC(C)C', 'QCH2A', 'C=O',
535
             'A!CH2!A', 'NA(A)A', 'C-O', 'C-N', 'O > 1', 'CH3', 'N',
536
             'AROMATIC', '6M RING', 'O', 'RING', 'FRAGMENTS'],
537
            name='maccs_idx')
538
539
540 1
class ErGFeaturizer(Transformer, Featurizer):
541
542
    """ Extended Reduced Graph Fingerprints.
543
544
     Implemented in RDKit."""
545
546 1
    def __init__(self, atom_types=0, fuzz_increment=0.3, min_path=1,
0 ignored issues
show
best-practice introduced by
Too many arguments (7/5)
Loading history...
547
                 max_path=15, n_jobs=1, verbose=True):
548
549
        """ Initialize an ErGFeaturizer object.
550
551
        # TODO complete docstring
0 ignored issues
show
Coding Style introduced by
TODO and FIXME comments should generally be avoided.
Loading history...
552
553
        Args:
554
            atom_types (AtomPairsParameters):
555
                The atom types to use.
556
557
            fuzz_increment (float):
558
                The fuzz increment.
559
560
            min_path (int):
561
                The minimum path.
562
563
            max_path (int):
564
                The maximum path.
565
566
            n_jobs (int):
567
                The number of processes to run the featurizer in.
568
569
            verbose (bool):
570
                Whether to output a progress bar.
571
        """
572
573
        super(ErGFeaturizer, self).__init__(n_jobs=n_jobs, verbose=verbose)
574
        self.atom_types = atom_types
575
        self.fuzz_increment = fuzz_increment
576
        self.min_path = min_path
577
        self.max_path = max_path
578
        self.n_feats = 315
579
580 1
    def _transform_mol(self, mol):
581
582
        return np.array(GetErGFingerprint(mol))
583
584 1
    @property
585
    def name(self):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
Coding Style introduced by
This method could be written as a function/class method.

If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example

class Foo:
    def some_method(self, x, y):
        return x + y;

could be written as

class Foo:
    @classmethod
    def some_method(cls, x, y):
        return x + y;
Loading history...
586
        return 'erg'
587
588 1
    @property
589
    def columns(self):
590
        return pd.RangeIndex(self.n_feats, name='erg_fp_idx')
591
592
593 1
class FeatureInvariantsFeaturizer(Transformer, Featurizer):
594
595
    """ Feature invariants fingerprints. """
596
597 1
    def __init__(self, n_jobs=1, verbose=True):
598
599
        """ Initialize a FeatureInvariantsFeaturizer.
600
601
        Args:
602
            verbose (bool):
603
                Whether to output a progress bar.
604
        """
605
        super(FeatureInvariantsFeaturizer, self).__init__(n_jobs=n_jobs,
606
                                                          verbose=verbose)
607
        raise NotImplementedError
608
609 1
    def _transform_mol(self, mol):
610
611
        return np.array(GetFeatureInvariants(mol))
612
613 1
    @property
614
    def name(self):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
Coding Style introduced by
This method could be written as a function/class method.

If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example

class Foo:
    def some_method(self, x, y):
        return x + y;

could be written as

class Foo:
    @classmethod
    def some_method(cls, x, y):
        return x + y;
Loading history...
615
        return 'feat_inv'
616
617 1
    @property
618
    def columns(self):
619
        return None
620
621
622 1
class ConnectivityInvariantsFeaturizer(Transformer, Featurizer):
623
624
    """ Connectivity invariants fingerprints """
625
626 1
    def __init__(self, include_ring_membership=True, n_jobs=1,
627
                 verbose=True):
628
629
        """ Initialize a ConnectivityInvariantsFeaturizer.
630
631
        Args:
632
            include_ring_membership (bool):
633
                Whether ring membership is considered when generating the
634
                invariants.
635
636
            n_jobs (int):
637
                The number of processes to run the featurizer in.
638
639
            verbose (bool):
640
                Whether to output a progress bar.
641
        """
642
        super(ConnectivityInvariantsFeaturizer, self).__init__(self,
643
                                                               n_jobs=n_jobs,
644
                                                               verbose=verbose)
645
        self.include_ring_membership = include_ring_membership
646
        raise NotImplementedError  # this is a sparse descriptor
647
648 1
    def _transform_mol(self, mol):
649
650
        return np.array(GetConnectivityInvariants(mol))
651
652 1
    @property
653
    def name(self):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
Coding Style introduced by
This method could be written as a function/class method.

If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example

class Foo:
    def some_method(self, x, y):
        return x + y;

could be written as

class Foo:
    @classmethod
    def some_method(cls, x, y):
        return x + y;
Loading history...
654
        return 'conn_inv'
655
656 1
    @property
657
    def columns(self):
658
        return None
659
660
661 1
class RDKFeaturizer(Transformer, Featurizer):
0 ignored issues
show
best-practice introduced by
Too many instance attributes (9/7)
Loading history...
662
663
    """ RDKit fingerprint """
664
665 1
    def __init__(self, min_path=1, max_path=7, n_feats=2048, n_bits_per_hash=2,
0 ignored issues
show
best-practice introduced by
Too many arguments (12/5)
Loading history...
666
                 use_hs=True, target_density=0.0, min_size=128,
667
                 branched_paths=True, use_bond_types=True, n_jobs=1,
668
                 verbose=True):
669
670
        """ RDK fingerprints
671
672
        Args:
673
            min_path (int):
674
                minimum number of bonds to include in the subgraphs.
675
676
            max_path (int):
677
                maximum number of bonds to include in the subgraphs.
678
679
            n_feats (int):
680
                The number of features to which to fold the fingerprint down.
681
                For unfolded, use `-1`.
682
683
            n_bits_per_hash (int)
684
                number of bits to set per path.
685
686
            use_hs (bool):
687
                include paths involving Hs in the fingerprint if the molecule
688
                has explicit Hs.
689
690
            target_density (float):
691
                fold the fingerprint until this minimum density has been
692
                reached.
693
694
            min_size (int):
695
                the minimum size the fingerprint will be folded to when trying
696
                to reach tgtDensity.
697
698
            branched_paths (bool):
699
                if set both branched and unbranched paths will be used in the
700
                fingerprint.
701
702
            use_bond_types (bool):
703
                if set both bond orders will be used in the path hashes.
704
705
            n_jobs (int):
706
                The number of processes to run the featurizer in.
707
708
            verbose (bool):
709
                Whether to output a progress bar.
710
711
        """
712
713
        super(RDKFeaturizer, self).__init__(n_jobs=n_jobs, verbose=verbose)
714
715
        self.min_path = min_path
716
        self.max_path = max_path
717
        self.n_feats = n_feats
718
        self.n_bits_per_hash = n_bits_per_hash
719
        self.use_hs = use_hs
720
        self.target_density = target_density
721
        self.min_size = min_size
722
        self.branched_paths = branched_paths
723
        self.use_bond_types = use_bond_types
724
725 1
    def _transform_mol(self, mol):
726
727
        return np.array(list(RDKFingerprint(mol, minPath=self.min_path,
728
                                            maxPath=self.max_path,
729
                                            fpSize=self.n_feats,
730
                                            nBitsPerHash=self.n_bits_per_hash,
731
                                            useHs=self.use_hs,
732
                                            tgtDensity=self.target_density,
733
                                            minSize=self.min_size,
734
                                            branchedPaths=self.branched_paths,
735
                                            useBondOrder=self.use_bond_types)))
736
737 1
    @property
738
    def name(self):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
Coding Style introduced by
This method could be written as a function/class method.

If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example

class Foo:
    def some_method(self, x, y):
        return x + y;

could be written as

class Foo:
    @classmethod
    def some_method(cls, x, y):
        return x + y;
Loading history...
739
        return 'rdkfp'
740
741 1
    @property
742
    def columns(self):
743
        return pd.RangeIndex(self.n_feats, name='rdk_fp_idx')
744