Completed
Push — master ( 9b3def...87dea9 )
by Rich
13:08
created

MorganFeaturizer.__init__()   A

Complexity

Conditions 1

Size

Total Lines 48

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 9
CRAP Score 1

Importance

Changes 1
Bugs 0 Features 0
Metric Value
c 1
b 0
f 0
dl 0
loc 48
ccs 9
cts 9
cp 1
rs 9.125
cc 1
crap 1
1
#! /usr/bin/env python
2
#
3
# Copyright (C) 2007-2009 Rich Lewis <[email protected]>
4
# License: 3-clause BSD
5
6 1
"""
7
## skchem.descriptors.fingerprints
8
9
Fingerprinting classes and associated functions are defined.
10
"""
11
12 1
import pandas as pd
0 ignored issues
show
Configuration introduced by
The import pandas could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
13 1
from rdkit.Chem import GetDistanceMatrix
0 ignored issues
show
Configuration introduced by
The import rdkit.Chem could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
14 1
from rdkit.DataStructs import ConvertToNumpyArray
0 ignored issues
show
Configuration introduced by
The import rdkit.DataStructs could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
15 1
from rdkit.Chem.rdMolDescriptors import (
0 ignored issues
show
Configuration introduced by
The import rdkit.Chem.rdMolDescriptors could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
16
    GetMorganFingerprint,
17
    GetHashedMorganFingerprint,
18
    GetMorganFingerprintAsBitVect,
19
    GetAtomPairFingerprint,
20
    GetHashedAtomPairFingerprint,
21
    GetHashedAtomPairFingerprintAsBitVect,
22
    GetTopologicalTorsionFingerprint,
23
    GetHashedTopologicalTorsionFingerprint,
24
    GetHashedTopologicalTorsionFingerprintAsBitVect,
25
    GetMACCSKeysFingerprint,
26
    GetFeatureInvariants,
27
    GetConnectivityInvariants)
28 1
from rdkit.Chem.rdReducedGraphs import GetErGFingerprint
0 ignored issues
show
Configuration introduced by
The import rdkit.Chem.rdReducedGraphs could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
29 1
from rdkit.Chem.rdmolops import RDKFingerprint
0 ignored issues
show
Configuration introduced by
The import rdkit.Chem.rdmolops could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
30
31 1
import numpy as np
0 ignored issues
show
Configuration introduced by
The import numpy could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
32 1
from ..base import Transformer, Featurizer
33
34
35 1
class MorganFeaturizer(Transformer, Featurizer):
0 ignored issues
show
best-practice introduced by
Too many instance attributes (8/7)
Loading history...
36
    """ Morgan fingerprints, implemented by RDKit.
37
38
    Notes:
39
40
    Currently, folded bits are by far the fastest implementation.
41
42
    Due to the speed of calculation, it is unlikely to see a speedup using
43
    the current parallel code, as more time is spent moving data across
44
    processes than for calculating in a single process.
45
46
    Examples:
47
48
    >>> import skchem
49
    >>> import pandas as pd
50
    >>> pd.options.display.max_rows = pd.options.display.max_columns = 5
51
52
    >>> mf = skchem.descriptors.MorganFeaturizer()
53
    >>> m = skchem.Mol.from_smiles('CCC')
54
55
    Can transform an individual molecule to yield a Series:
56
57
    >>> mf.transform(m)
58
    morgan_fp_idx
59
    0       0
60
    1       0
61
    ..
62
    2046    0
63
    2047    0
64
    Name: MorganFeaturizer, dtype: uint8
65
66
    Can transform a list of molecules to yield a DataFrame:
67
68
    >>> mf.transform([m])
69
    morgan_fp_idx  0     1     ...   2046  2047
70
    0                 0     0  ...      0     0
71
    <BLANKLINE>
72
    [1 rows x 2048 columns]
73
74
    Change the number of features the fingerprint is folded down to using
75
    `n_feats`.
76
77
    >>> mf.n_feats = 1024
78
    >>> mf.transform(m)
79
    morgan_fp_idx
80
    0       0
81
    1       0
82
    ..
83
    1022    0
84
    1023    0
85
    Name: MorganFeaturizer, dtype: uint8
86
87
    Count fingerprints with `as_bits` = False
88
89
    >>> mf.as_bits = False
90
    >>> res = mf.transform(m); res[res > 0]
91
    morgan_fp_idx
92
    33     2
93
    80     1
94
    294    2
95
    320    1
96
    Name: MorganFeaturizer, dtype: int64
97
98
    Pseudo-gradient with `grad` shows which atoms contributed to which
99
    feature.
100
101
    >>> mf.grad(m)[res > 0]
102
    atom_idx  0  1  2
103
    features
104
    33        1  0  1
105
    80        0  1  0
106 1
    294       1  2  1
107
    320       1  1  1
108
109
    """
110
    def __init__(self, radius=2, n_feats=2048, as_bits=True,
0 ignored issues
show
best-practice introduced by
Too many arguments (9/5)
Loading history...
111
                 use_features=False, use_bond_types=True, use_chirality=False,
112
                 n_jobs=1, verbose=True):
113
114
        """ Initialize the fingerprinter object.
115
116
        Args:
117
             radius (int):
118
                 The maximum radius for atom environments.
119
                 Default is `2`.
120
121
             n_feats (int):
122
                 The number of features to which to fold the fingerprint down.
123
                 For unfolded, use `-1`.
124
                 Default is `2048`.
125
126
             as_bits (bool):
127
                 Whether to return bits (`True`) or counts (`False`).
128
                 Default is `True`.
129
130
             use_features (bool):
131
                 Whether to use map atom types to generic features (FCFP).
132
                 Default is `False`.
133
134
             use_bond_types (bool):
135
                 Whether to use bond types to differentiate environments.
136
                 Default is `False`.
137 1
138 1
             use_chirality (bool):
139 1
                 Whether to use chirality to differentiate environments.
140 1
                 Default is `False`.
141 1
142 1
             n_jobs (int):
143 1
                 The number of processes to run the featurizer in.
144 1
145
             verbose (bool):
146 1
                 Whether to output a progress bar.
147
148
        """
149
150
        super(MorganFeaturizer, self).__init__(n_jobs=n_jobs, verbose=verbose)
151
        self.radius = radius
152
        self.n_feats = n_feats
153
        self.sparse = self.n_feats < 0
154
        self.as_bits = as_bits
155
        self.use_features = use_features
156
        self.use_bond_types = use_bond_types
157
        self.use_chirality = use_chirality
158
159
    def _transform_mol(self, mol):
160
161 1
        """Private method to transform a skchem molecule.
162
163 1
        Use `transform` for the public method, which genericizes the argument
164
        to iterables of mols.
165
166
        Args:
167
            mol (skchem.Mol): Molecule to calculate fingerprint for.
168
169 1
        Returns:
170 1
            np.array or dict:
171 1
                Fingerprint as an array (or a dict if sparse).
172
        """
173
174
        if self.as_bits and self.n_feats > 0:
175 1
176
            fp = GetMorganFingerprintAsBitVect(
0 ignored issues
show
Coding Style Naming introduced by
The name fp does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
177
                mol, self.radius, nBits=self.n_feats,
178
                useFeatures=self.use_features,
179
                useBondTypes=self.use_bond_types,
180
                useChirality=self.use_chirality)
181
182
            res = np.array(0)
183
            ConvertToNumpyArray(fp, res)
184
            res = res.astype(np.uint8)
185
186
        else:
187
188 1
            if self.n_feats <= 0:
189
190
                res = GetMorganFingerprint(
191
                    mol, self.radius,
192
                    useFeatures=self.use_features,
193
                    useBondTypes=self.use_bond_types,
194 1
                    useChirality=self.use_chirality)
195
196 1
                res = res.GetNonzeroElements()
197
                if self.as_bits:
198 1
                    res = {k: int(v > 0) for k, v in res.items()}
199
200
            else:
201
                res = GetHashedMorganFingerprint(
202 1
                    mol, self.radius, nBits=self.n_feats,
203
                    useFeatures=self.use_features,
204 1
                    useBondTypes=self.use_bond_types,
205
                    useChirality=self.use_chirality)
206 1
207
                res = np.array(list(res))
208
209
        return res
210
211
    @property
212
    def name(self):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
Coding Style introduced by
This method could be written as a function/class method.

If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example

class Foo:
    def some_method(self, x, y):
        return x + y;

could be written as

class Foo:
    @classmethod
    def some_method(cls, x, y):
        return x + y;
Loading history...
213
        return 'morg'
214
215
    @property
216
    def columns(self):
217
        return pd.RangeIndex(self.n_feats, name='morgan_fp_idx')
218
219
    def grad(self, mol):
220
221
        """ Calculate the pseudo gradient with respect to the atoms.
222
223 1
        The pseudo gradient is the number of times the atom set that particular
224 1
        bit.
225
226 1
        Args:
227
            mol (skchem.Mol):
228 1
                The molecule for which to calculate the pseudo gradient.
229
230
        Returns:
231
            pandas.DataFrame:
232
                Dataframe of pseudogradients, with columns corresponding to
233
                atoms, and rows corresponding to features of the fingerprint.
234
        """
235
236
        cols = pd.Index(list(range(len(mol.atoms))), name='atom_idx')
237
        dist = GetDistanceMatrix(mol)
238
239
        info = {}
240
241
        if self.n_feats < 0:
242
243
            res = GetMorganFingerprint(mol, self.radius,
244 1
                                       useFeatures=self.use_features,
245
                                       useBondTypes=self.use_bond_types,
246
                                       useChirality=self.use_chirality,
247
                                       bitInfo=info).GetNonzeroElements()
248
            idx_list = list(res.keys())
249
            idx = pd.Index(idx_list, name='features')
250 1
            grad = np.zeros((len(idx), len(cols)))
251 1
            for bit in info:
252
                for atom_idx, radius in info[bit]:
253 1
                    grad[idx_list.index(bit)] += (dist <= radius)[atom_idx]
254 1
255 1
        else:
256
257 1
            GetHashedMorganFingerprint(mol, self.radius, nBits=self.n_feats,
258
                                       useFeatures=self.use_features,
259 1
                                       useBondTypes=self.use_bond_types,
260
                                       useChirality=self.use_chirality,
261
                                       bitInfo=info)
262 1
263
            idx = pd.Index(range(self.n_feats), name='features')
264
            grad = np.zeros((len(idx), len(cols)))
265 1
266
            for bit in info:
267
                for atom_idx, radius in info[bit]:
268
                    grad[bit] += (dist <= radius)[atom_idx]
269 1
270
        grad = pd.DataFrame(grad, index=idx, columns=cols)
271
272
        if self.as_bits:
273
            grad = (grad > 0)
274
275
        return grad.astype(int)
276
277
278
class AtomPairFeaturizer(Transformer, Featurizer):
279
280
    """ Atom Pair Fingerprints, implemented by RDKit. """
281
282
    def __init__(self, min_length=1, max_length=30, n_feats=2048,
0 ignored issues
show
best-practice introduced by
Too many arguments (8/5)
Loading history...
283
                 as_bits=False, use_chirality=False, n_jobs=1, verbose=True):
284
285
        """ Instantiate an atom pair fingerprinter.
286
287
        Args:
288
            min_length (int):
289
                The minimum length of paths between pairs.
290
                Default is `1`, i.e. pairs can be bonded together.
291
292
            max_length (int):
293
                The maximum length of paths between pairs.
294
                Default is `30`.
295
296
            n_feats (int):
297
                The number of features to which to fold the fingerprint down.
298
                For unfolded, use `-1`.
299
                Default is `2048`.
300
301
            as_bits (bool):
302
                Whether to return bits (`True`) or counts (`False`).
303 1
                Default is `False`.
304
305
            use_chirality (bool):
306
                Whether to use chirality to differentiate environments.
307
                Default is `False`.
308
309
            n_jobs (int):
310
                The number of processes to run the featurizer in.
311
312
            verbose (bool):
313
                Whether to output a progress bar.
314
        """
315
316
        super(AtomPairFeaturizer, self).__init__(n_jobs=n_jobs,
317
                                                 verbose=verbose)
318
        self.min_length = min_length
319
        self.max_length = max_length
320
        self.n_feats = n_feats
321
        self.sparse = self.n_feats < 0
322
        self.as_bits = as_bits
323
        self.use_chirality = use_chirality
324
325
    def _transform_mol(self, mol):
326
327
        """Private method to transform a skchem molecule.
328
329
        Use transform` for the public method, which genericizes the argument to
330
        iterables of mols.
331
332
        Args:
333
            mol (skchem.Mol): Molecule to calculate fingerprint for.
334
335
        Returns:
336
            np.array or dict:
337
                Fingerprint as an array (or a dict if sparse).
338
        """
339
340
        if self.as_bits and self.n_feats > 0:
341
342
            fp = GetHashedAtomPairFingerprintAsBitVect(
0 ignored issues
show
Coding Style Naming introduced by
The name fp does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
343
                mol, nBits=self.n_feats, minLength=self.min_length,
344
                maxLength=self.max_length, includeChirality=self.use_chirality)
345
346
            res = np.array(0)
347
            ConvertToNumpyArray(fp, res)
348
            res = res.astype(np.uint8)
349
350
        else:
351 1
352
            if self.n_feats <= 0:
353
354
                res = GetAtomPairFingerprint(
355 1
                    mol, nBits=self.n_feats, minLength=self.min_length,
356
                    maxLength=self.max_length,
357
                    includeChirality=self.use_chirality)
358
359
                res = res.GetNonzeroElements()
360 1
                if self.as_bits:
361
                    res = {k: int(v > 0) for k, v in res.items()}
362
363
            else:
364 1
                res = GetHashedAtomPairFingerprint(
365
                    mol, nBits=self.n_feats, minLength=self.min_length,
366
                    maxLength=self.max_length,
367
                    includeChirality=self.use_chirality)
368
369
                res = np.array(list(res))
370
371
        return res
372
373
    @property
374
    def name(self):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
Coding Style introduced by
This method could be written as a function/class method.

If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example

class Foo:
    def some_method(self, x, y):
        return x + y;

could be written as

class Foo:
    @classmethod
    def some_method(cls, x, y):
        return x + y;
Loading history...
375
        return 'atom_pair'
376
377
    @property
378
    def columns(self):
379
        return pd.RangeIndex(self.n_feats, name='ap_fp_idx')
380
381
382
class TopologicalTorsionFeaturizer(Transformer, Featurizer):
383 View Code Duplication
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
384
    """ Topological Torsion fingerprints, implemented by RDKit. """
385
386
    def __init__(self, target_size=4, n_feats=2048, as_bits=False,
0 ignored issues
show
best-practice introduced by
Too many arguments (7/5)
Loading history...
387
                 use_chirality=False, n_jobs=1, verbose=True):
388
389
        """ Initialize a TopologicalTorsionFeaturizer object.
390
391
        Args:
392 1
            target_size (int):
393
                # TODO
394
395
            n_feats (int):
396
                The number of features to which to fold the fingerprint down.
397
                For unfolded, use `-1`.
398
                Default is `2048`.
399
400
            as_bits (bool):
401
                Whether to return bits (`True`) or counts (`False`).
402
                Default is `False`.
403
404
            use_chirality (bool):
405
                Whether to use chirality to differentiate environments.
406
                Default is `False`.
407
            n_jobs (int):
408
                The number of processes to run the featurizer in.
409
410
            verbose (bool):
411
                Whether to output a progress bar.
412
        """
413
414
        self.target_size = target_size
415
        self.n_feats = n_feats
416
        self.sparse = self.n_feats < 0
417
        self.as_bits = as_bits
418
        self.use_chirality = use_chirality
419
        super(TopologicalTorsionFeaturizer, self).__init__(n_jobs=n_jobs,
420
                                                           verbose=verbose)
421
422
    def _transform_mol(self, mol):
423
        """ Private method to transform a skchem molecule.
424
        Args:
425
            mol (skchem.Mol): Molecule to calculate fingerprint for.
426
427
        Returns:
428
            np.array or dict:
429
                Fingerprint as an array (or a dict if sparse).
430
        """
431
432
        if self.as_bits and self.n_feats > 0:
433 1
434
            fp = GetHashedTopologicalTorsionFingerprintAsBitVect(
0 ignored issues
show
Coding Style Naming introduced by
The name fp does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
435
                mol, nBits=self.n_feats, targetSize=self.target_size,
436
                includeChirality=self.use_chirality)
437 1
438
            res = np.array(0)
439
            ConvertToNumpyArray(fp, res)
440
            res = res.astype(np.uint8)
441
442 1
        else:
443
444
            if self.n_feats <= 0:
445
446 1
                res = GetTopologicalTorsionFingerprint(
447
                    mol, nBits=self.n_feats, targetSize=self.target_size,
448
                    includeChirality=self.use_chirality)
449
450
                res = res.GetNonzeroElements()
451
                if self.as_bits:
452
                    res = {k: int(v > 0) for k, v in res.items()}
453
454
            else:
455
                res = GetHashedTopologicalTorsionFingerprint(
456
                    mol, nBits=self.n_feats, targetSize=self.target_size,
457
                    includeChirality=self.use_chirality)
458 1
459 View Code Duplication
                res = np.array(list(res))
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
460
461 1
        return res
462
463
    @property
464
    def names(self):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
Coding Style introduced by
This method could be written as a function/class method.

If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example

class Foo:
    def some_method(self, x, y):
        return x + y;

could be written as

class Foo:
    @classmethod
    def some_method(cls, x, y):
        return x + y;
Loading history...
465 1
        return 'top_tort'
466
467
    @property
468
    def columns(self):
469
        return pd.RangeIndex(self.n_feats, name='tt_fp_idx')
470
471
472
class MACCSFeaturizer(Transformer, Featurizer):
473
474
    """ MACCS Keys Fingerprints."""
475
476
    def __init__(self, n_jobs=1, verbose=True):
477
478
        """ Initialize a MACCS Featurizer.
479
480
        Args:
481
            n_jobs (int):
482
                The number of processes to run the featurizer in.
483
484
            verbose (bool):
485
                Whether to output a progress bar.
486
        """
487
488
        super(MACCSFeaturizer, self).__init__(n_jobs=n_jobs, verbose=verbose)
489
        self.n_feats = 166
490
491
    def _transform_mol(self, mol):
492
        return np.array(list(GetMACCSKeysFingerprint(mol)))[1:]
493
494
    @property
495
    def name(self):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
Coding Style introduced by
This method could be written as a function/class method.

If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example

class Foo:
    def some_method(self, x, y):
        return x + y;

could be written as

class Foo:
    @classmethod
    def some_method(cls, x, y):
        return x + y;
Loading history...
496
        return 'maccs'
497
498
    @property
499
    def columns(self):
500
        return pd.Index(
501
            ['ISOTOPE', '103 < ATOMIC NO. < 256',
502
             'GROUP IVA,VA,VIA PERIODS 4-6 (Ge...)', 'ACTINIDE',
503
             'GROUP IIIB,IVB (Sc...)', 'LANTHANIDE',
504
             'GROUP VB,VIB,VIIB (V...)', 'QAAA@1', 'GROUP VIII (Fe...)',
505
             'GROUP IIA (ALKALINE EARTH)', '4M RING', 'GROUP IB,IIB (Cu...)',
506
             'ON(C)C', 'S-S', 'OC(O)O', 'QAA@1', 'CTC',
507 1
             'GROUP IIIA (B...)', '7M RING', 'SI', 'C=C(Q)Q', '3M RING',
508
             'NC(O)O', 'N-O', 'NC(N)N', 'C$=C($A)$A', 'I',
509
             'QCH2Q', 'P', 'CQ(C)(C)A', 'QX', 'CSN', 'NS', 'CH2=A',
510
             'GROUP IA (ALKALI METAL)', 'S HETEROCYCLE',
511
             'NC(O)N', 'NC(C)N', 'OS(O)O', 'S-O', 'CTN', 'F', 'QHAQH', 'OTHER',
512
             'C=CN', 'BR', 'SAN', 'OQ(O)O', 'CHARGE',
513 1
             'C=C(C)C', 'CSO', 'NN', 'QHAAAQH', 'QHAAQH', 'OSO', 'ON(O)C',
514
             'O HETEROCYCLE', 'QSQ', 'Snot%A%A', 'S=O',
515
             'AS(A)A', 'A$A!A$A', 'N=O', 'A$A!S', 'C%N', 'CC(C)(C)A', 'QS',
516
             'QHQH (&...)', 'QQH', 'QNQ', 'NO', 'OAAO',
517
             'S=A', 'CH3ACH3', 'A!N$A', 'C=C(A)A', 'NAN', 'C=N', 'NAAN',
518
             'NAAAN', 'SA(A)A', 'ACH2QH', 'QAAAA@1', 'NH2',
519
             'CN(C)C', 'CH2QCH2', 'X!A$A', 'S', 'OAAAO', 'QHAACH2A',
520
             'QHAAACH2A', 'OC(N)C', 'QCH3', 'QN', 'NAAO',
521
             '5M RING', 'NAAAO', 'QAAAAA@1', 'C=C', 'ACH2N', '8M RING', 'QO',
522
             'CL', 'QHACH2A', 'A$A($A)$A', 'QA(Q)Q',
523 1
             'XA(A)A', 'CH3AAACH2A', 'ACH2O', 'NCO', 'NACH2A', 'AA(A)(A)A',
524
             'Onot%A%A', 'CH3CH2A', 'CH3ACH2A',
525
             'CH3AACH2A', 'NAO', 'ACH2CH2A > 1', 'N=A',
526
             'HETEROCYCLIC ATOM > 1 (&...)', 'N HETEROCYCLE', 'AN(A)A',
527 1
             'OCO', 'QQ', 'AROMATIC RING > 1', 'A!O!A', 'A$A!O > 1 (&...)',
528
             'ACH2AAACH2A', 'ACH2AACH2A',
529
             'QQ > 1 (&...)', 'QH > 1', 'OACH2A', 'A$A!N', 'X (HALOGEN)',
530
             'Nnot%A%A', 'O=A > 1', 'HETEROCYCLE',
531 1
             'QCH2A > 1 (&...)', 'OH', 'O > 3 (&...)', 'CH3 > 2 (&...)',
532
             'N > 1', 'A$A!O', 'Anot%A%Anot%A',
533
             '6M RING > 1', 'O > 2', 'ACH2CH2A', 'AQ(A)A', 'CH3 > 1',
534
             'A!A$A!A', 'NH', 'OC(C)C', 'QCH2A', 'C=O',
535
             'A!CH2!A', 'NA(A)A', 'C-O', 'C-N', 'O > 1', 'CH3', 'N',
536 1
             'AROMATIC', '6M RING', 'O', 'RING', 'FRAGMENTS'],
537
            name='maccs_idx')
538
539
540 1
class ErGFeaturizer(Transformer, Featurizer):
541
542
    """ Extended Reduced Graph Fingerprints.
543
544
     Implemented in RDKit."""
545
546
    def __init__(self, atom_types=0, fuzz_increment=0.3, min_path=1,
0 ignored issues
show
best-practice introduced by
Too many arguments (7/5)
Loading history...
547
                 max_path=15, n_jobs=1, verbose=True):
548
549
        """ Initialize an ErGFeaturizer object.
550
551 1
        # TODO complete docstring
0 ignored issues
show
Coding Style introduced by
TODO and FIXME comments should generally be avoided.
Loading history...
552
553
        Args:
554
            atom_types (AtomPairsParameters):
555 1
                The atom types to use.
556
557
            fuzz_increment (float):
558
                The fuzz increment.
559 1
560
            min_path (int):
561
                The minimum path.
562
563
            max_path (int):
564 1
                The maximum path.
565
566
            n_jobs (int):
567
                The number of processes to run the featurizer in.
568 1
569
            verbose (bool):
570
                Whether to output a progress bar.
571
        """
572
573
        super(ErGFeaturizer, self).__init__(n_jobs=n_jobs, verbose=verbose)
574
        self.atom_types = atom_types
575
        self.fuzz_increment = fuzz_increment
576
        self.min_path = min_path
577
        self.max_path = max_path
578
        self.n_feats = 315
579
580
    def _transform_mol(self, mol):
581
582
        return np.array(GetErGFingerprint(mol))
583
584
    @property
585 1
    def name(self):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
Coding Style introduced by
This method could be written as a function/class method.

If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example

class Foo:
    def some_method(self, x, y):
        return x + y;

could be written as

class Foo:
    @classmethod
    def some_method(cls, x, y):
        return x + y;
Loading history...
586
        return 'erg'
587
588
    @property
589 1
    def columns(self):
590
        return pd.RangeIndex(self.n_feats, name='erg_fp_idx')
591
592
593 1
class FeatureInvariantsFeaturizer(Transformer, Featurizer):
594
595
    """ Feature invariants fingerprints. """
596
597
    def __init__(self, n_jobs=1, verbose=True):
598 1
599
        """ Initialize a FeatureInvariantsFeaturizer.
600
601
        Args:
602 1
            verbose (bool):
603
                Whether to output a progress bar.
604
        """
605
        super(FeatureInvariantsFeaturizer, self).__init__(n_jobs=n_jobs,
606
                                                          verbose=verbose)
607
        raise NotImplementedError
608
609
    def _transform_mol(self, mol):
610
611
        return np.array(GetFeatureInvariants(mol))
612
613
    @property
614
    def name(self):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
Coding Style introduced by
This method could be written as a function/class method.

If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example

class Foo:
    def some_method(self, x, y):
        return x + y;

could be written as

class Foo:
    @classmethod
    def some_method(cls, x, y):
        return x + y;
Loading history...
615
        return 'feat_inv'
616
617
    @property
618
    def columns(self):
619
        return None
620
621
622
class ConnectivityInvariantsFeaturizer(Transformer, Featurizer):
623
624
    """ Connectivity invariants fingerprints """
625
626
    def __init__(self, include_ring_membership=True, n_jobs=1,
627
                 verbose=True):
628
629
        """ Initialize a ConnectivityInvariantsFeaturizer.
630
631
        Args:
632
            include_ring_membership (bool):
633
                Whether ring membership is considered when generating the
634
                invariants.
635
636
            n_jobs (int):
637
                The number of processes to run the featurizer in.
638
639
            verbose (bool):
640
                Whether to output a progress bar.
641
        """
642
        super(ConnectivityInvariantsFeaturizer, self).__init__(self,
643
                                                               n_jobs=n_jobs,
644
                                                               verbose=verbose)
645
        self.include_ring_membership = include_ring_membership
646
        raise NotImplementedError  # this is a sparse descriptor
647
648
    def _transform_mol(self, mol):
649
650
        return np.array(GetConnectivityInvariants(mol))
651
652
    @property
653
    def name(self):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
Coding Style introduced by
This method could be written as a function/class method.

If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example

class Foo:
    def some_method(self, x, y):
        return x + y;

could be written as

class Foo:
    @classmethod
    def some_method(cls, x, y):
        return x + y;
Loading history...
654
        return 'conn_inv'
655
656
    @property
657
    def columns(self):
658 1
        return None
659
660
661
class RDKFeaturizer(Transformer, Featurizer):
0 ignored issues
show
best-practice introduced by
Too many instance attributes (9/7)
Loading history...
662
663
    """ RDKit fingerprint """
664
665
    def __init__(self, min_path=1, max_path=7, n_feats=2048, n_bits_per_hash=2,
0 ignored issues
show
best-practice introduced by
Too many arguments (12/5)
Loading history...
666
                 use_hs=True, target_density=0.0, min_size=128,
667
                 branched_paths=True, use_bond_types=True, n_jobs=1,
668
                 verbose=True):
669
670 1
        """ RDK fingerprints
671
672
        Args:
673
            min_path (int):
674 1
                minimum number of bonds to include in the subgraphs.
675
676
            max_path (int):
677
                maximum number of bonds to include in the subgraphs.
678
679
            n_feats (int):
680
                The number of features to which to fold the fingerprint down.
681
                For unfolded, use `-1`.
682
683
            n_bits_per_hash (int)
684
                number of bits to set per path.
685
686
            use_hs (bool):
687
                include paths involving Hs in the fingerprint if the molecule
688
                has explicit Hs.
689
690
            target_density (float):
691
                fold the fingerprint until this minimum density has been
692
                reached.
693
694
            min_size (int):
695
                the minimum size the fingerprint will be folded to when trying
696
                to reach tgtDensity.
697
698
            branched_paths (bool):
699
                if set both branched and unbranched paths will be used in the
700
                fingerprint.
701
702
            use_bond_types (bool):
703
                if set both bond orders will be used in the path hashes.
704
705
            n_jobs (int):
706
                The number of processes to run the featurizer in.
707
708
            verbose (bool):
709
                Whether to output a progress bar.
710
711
        """
712
713
        super(RDKFeaturizer, self).__init__(n_jobs=n_jobs, verbose=verbose)
714
715
        self.min_path = min_path
716
        self.max_path = max_path
717
        self.n_feats = n_feats
718
        self.n_bits_per_hash = n_bits_per_hash
719
        self.use_hs = use_hs
720
        self.target_density = target_density
721
        self.min_size = min_size
722
        self.branched_paths = branched_paths
723
        self.use_bond_types = use_bond_types
724
725
    def _transform_mol(self, mol):
726
727
        return np.array(list(RDKFingerprint(mol, minPath=self.min_path,
728
                                            maxPath=self.max_path,
729
                                            fpSize=self.n_feats,
730
                                            nBitsPerHash=self.n_bits_per_hash,
731
                                            useHs=self.use_hs,
732
                                            tgtDensity=self.target_density,
733
                                            minSize=self.min_size,
734
                                            branchedPaths=self.branched_paths,
735
                                            useBondOrder=self.use_bond_types)))
736
737
    @property
738
    def name(self):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
Coding Style introduced by
This method could be written as a function/class method.

If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example

class Foo:
    def some_method(self, x, y):
        return x + y;

could be written as

class Foo:
    @classmethod
    def some_method(cls, x, y):
        return x + y;
Loading history...
739
        return 'rdkfp'
740
741
    @property
742
    def columns(self):
743
        return pd.RangeIndex(self.n_feats, name='rdk_fp_idx')
744