Completed
Push — master ( c7f259...9726b9 )
by Rich
01:31
created

Fingerprinter._transform_sparse()   D

Complexity

Conditions 8

Size

Total Lines 23

Duplication

Lines 0
Ratio 0 %

Importance

Changes 0
Metric Value
cc 8
c 0
b 0
f 0
dl 0
loc 23
rs 4.7619
1
#! /usr/bin/env python
2
#
3
# Copyright (C) 2007-2009 Rich Lewis <[email protected]>
4
# License: 3-clause BSD
5
6
"""
7
## skchem.descriptors.fingerprints
8
9
Fingerprinting classes and associated functions are defined.
10
"""
11
12
from functools import wraps
13
from collections import Iterable
0 ignored issues
show
Unused Code introduced by
Unused Iterable imported from collections
Loading history...
14
import pandas as pd
0 ignored issues
show
Configuration introduced by
The import pandas could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
15
from rdkit.Chem import DataStructs, GetDistanceMatrix
0 ignored issues
show
Configuration introduced by
The import rdkit.Chem could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
16
from rdkit.DataStructs import ConvertToNumpyArray
0 ignored issues
show
Configuration introduced by
The import rdkit.DataStructs could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
17
from rdkit.Chem.rdMolDescriptors import (GetMorganFingerprint,
0 ignored issues
show
Configuration introduced by
The import rdkit.Chem.rdMolDescriptors could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
18
                                         GetHashedMorganFingerprint,
19
                                         GetMorganFingerprintAsBitVect,
20
                                         GetAtomPairFingerprint,
21
                                         GetHashedAtomPairFingerprint,
22
                                         GetHashedAtomPairFingerprintAsBitVect,
23
                                         GetTopologicalTorsionFingerprint,
24
                                         GetHashedTopologicalTorsionFingerprint,
25
                                         GetHashedTopologicalTorsionFingerprintAsBitVect,
26
                                         GetMACCSKeysFingerprint,
27
                                         GetFeatureInvariants,
28
                                         GetConnectivityInvariants)
29
from rdkit.Chem.rdReducedGraphs import GetErGFingerprint
0 ignored issues
show
Configuration introduced by
The import rdkit.Chem.rdReducedGraphs could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
30
from rdkit.Chem.rdmolops import RDKFingerprint
0 ignored issues
show
Configuration introduced by
The import rdkit.Chem.rdmolops could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
31
32
import numpy as np
0 ignored issues
show
Configuration introduced by
The import numpy could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
33
import skchem
34
35
def skchemize(func, columns=None, *args, **kwargs):
36
    """
37
38
    transform an RDKit fingerprinting function to work well with pandas
39
40
    >>> from rdkit import Chem
41
    >>> import skchem
42
    >>> from skchem.descriptors import skchemize
43
    >>> from skchem.core import Mol
44
    >>> f = skchemize(Chem.RDKFingerprint)
45
    >>> m = Mol.from_smiles('c1ccccc1')
46
    >>> f(m)
47
    0       0
48
    1       0
49
    2       0
50
    3       0
51
    4       0
52
    5       0
53
    6       0
54
    7       0
55
    8       0
56
    9       0
57
    10      0
58
    11      0
59
    12      0
60
    13      0
61
    14      0
62
    15      0
63
    16      0
64
    17      0
65
    18      0
66
    19      0
67
    20      0
68
    21      0
69
    22      0
70
    23      0
71
    24      0
72
    25      0
73
    26      0
74
    27      0
75
    28      0
76
    29      0
77
           ..
78
    2018    0
79
    2019    0
80
    2020    0
81
    2021    0
82
    2022    0
83
    2023    0
84
    2024    0
85
    2025    0
86
    2026    0
87
    2027    0
88
    2028    0
89
    2029    0
90
    2030    0
91
    2031    0
92
    2032    0
93
    2033    0
94
    2034    0
95
    2035    0
96
    2036    0
97
    2037    0
98
    2038    0
99
    2039    0
100
    2040    0
101
    2041    0
102
    2042    0
103
    2043    0
104
    2044    0
105
    2045    0
106
    2046    0
107
    2047    0
108
    dtype: int64
109
    >>> from skchem.data import resource
110
    >>> df = skchem.read_sdf(resource('test_sdf', 'multi_molecule-simple.sdf'))
111
    >>> df.structure.apply(f) # doctest: +NORMALIZE_WHITESPACE
112
          0     1     2     3     4     5     6     7     8     9     ...   2038
113
    name                                                              ...
114
    297      0     0     0     0     0     0     0     0     0     0  ...      0
115
    6324     0     0     0     0     0     0     0     0     0     0  ...      0
116
    6334     0     0     0     0     0     0     0     0     0     0  ...      0
117
    <BLANKLINE>
118
          2039  2040  2041  2042  2043  2044  2045  2046  2047
119
    name
120
    297      0     0     0     0     0     0     0     0     0
121
    6324     0     0     0     0     0     0     0     0     0
122
    6334     0     0     0     0     0     0     0     0     0
123
    <BLANKLINE>
124
    [3 rows x 2048 columns]
125
126
    """
127
    @wraps(func)
128
    def func_wrapper(m):
0 ignored issues
show
Coding Style Naming introduced by
The name m does not conform to the argument naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
129
130
        """ Function that wraps an rdkit function allowing it to produce dataframes. """
131
132
        arr = np.array(0)
133
        DataStructs.ConvertToNumpyArray(func(m, *args, **kwargs), arr)
0 ignored issues
show
Coding Style introduced by
Usage of * or ** arguments should usually be done with care.

Generally, there is nothing wrong with usage of * or ** arguments. For readability of the code base, we suggest to not over-use these language constructs though.

For more information, we can recommend this blog post from Ned Batchelder including its comments which also touches this aspect.

Loading history...
134
135
        return pd.Series(arr, index=columns)
136
137
    return func_wrapper
138
139
140
class Fingerprinter(object):
141
142
    """ Fingerprinter Base class. """
143
144
    def __init__(self, func, sparse=False, name=None):
145
146
        """ A generic fingerprinter.  Create with a function.
147
148
        Args:
149
            func (callable):
150
                A fingerprinting function that takes an skchem.Mol argument, and
151
                returns an iterable of values.
152
            name (str):
153
                The name of the fingerprints that are being calculated"""
154
155
        self.NAME = name
0 ignored issues
show
Coding Style Naming introduced by
The name NAME does not conform to the attribute naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
156
        self.func = func
157
        self.sparse = sparse
158
159
    def __call__(self, obj):
160
161
        """ Call the fingerprinter directly.
162
163
        This is a shorthand for transform. """
164
165
        return self.transform(obj)
166
167
    def __add__(self, other):
168
169
        """ Add fingerprinters together to create a fusion fingerprinter.
170
171
        Fusion featurizers will transform molecules to series with all
172
        features from all component featurizers.
173
        """
174
175
        fpers = []
176
        for fper in (self, other):
177
            if isinstance(fper, FusionFingerprinter):
178
                fpers += fper.fingerprinters
0 ignored issues
show
Bug introduced by
The Instance of Fingerprinter does not seem to have a member named fingerprinters.

This check looks for calls to members that are non-existent. These calls will fail.

The member could have been renamed or removed.

Loading history...
179
            else:
180
                fpers.append(fper)
181
182
        return FusionFingerprinter(fpers)
183
184
    def fit(self, X, y):
0 ignored issues
show
Coding Style Naming introduced by
The name X does not conform to the argument naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
Coding Style Naming introduced by
The name y does not conform to the argument naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
Unused Code introduced by
The argument y seems to be unused.
Loading history...
Unused Code introduced by
The argument X seems to be unused.
Loading history...
185
        return self
186
187
    def transform(self, obj):
188
189
        """ Calculate a fingerprint for the given object.
190
191
        Args:
192
            obj (skchem.Mol or pd.Series or pd.DataFrame or iterable):
193
                The object to be transformed.
194
195
        Returns:
196
            pd.DataFrame:
197
                The produced features.
198
        """
199
200
        if self.sparse:
201
            return self._transform_sparse(obj)
202
        else:
203
            return self._transform_dense(obj)
204
205
    def _transform_dense(self, obj):
206
207
        """ calculate a dense fingerprint for the given object. """
208
209
        if isinstance(obj, skchem.Mol):
210
            return pd.Series(self._transform(obj), index=self.index)
211
212
        elif isinstance(obj, pd.DataFrame):
213
            return self.transform(obj.structure)
214
215
        elif isinstance(obj, pd.Series):
216
            res_0 = self._transform(obj.ix[0])
217
            res = np.zeros((len(obj), len(res_0)))
218
            for i, mol in enumerate(obj):
219
                res[i] = self._transform(mol)
220
            return pd.DataFrame(res, index=obj.index, columns=self.index)
221
222
        elif isinstance(obj, (tuple, list)):
223
            res_0 = self._transform(obj[0])
224
            res = np.zeros((len(obj), len(res_0)))
225
            for i, mol in enumerate(obj):
226
                res[i] = self._transform(mol)
227
228
            idx = pd.Index([mol.name for mol in obj], name='name')
229
            return pd.DataFrame(res, index=idx, columns=self.index)
230
231
        else:
232
            raise NotImplementedError
233
234
    def _transform_sparse(self, obj):
235
236
        """ Calculate a sparse fingerprint for the given object. """
237
238
        if isinstance(obj, skchem.Mol):
239
            return pd.Series(self._transform(obj), index=self.index)
240
241
        elif isinstance(obj, pd.DataFrame):
242
            return self.transform(obj.structure)
243
244
        elif isinstance(obj, pd.Series):
245
            return pd.DataFrame([self.transform(m) for m in obj],
246
                                index=obj.idx,
247
                                columns=self.index).fillna(0)
248
249
        elif isinstance(obj, (tuple, list)):
250
            idx = pd.Index([mol.name for mol in obj], name='name')
251
            return pd.DataFrame([self.transform(m) for m in obj],
252
                                index=idx,
253
                                columns=self.index)
254
255
        else:
256
            raise NotImplementedError
257
258
    def _transform(self, mol):
259
260
        """ Calculate the fingerprint on a molecule. """
261
262
        return pd.Series(list(self.func(mol)), name=mol.name)
263
264
    @property
265
    def index(self):
0 ignored issues
show
Coding Style introduced by
This method could be written as a function/class method.

If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example

class Foo:
    def some_method(self, x, y):
        return x + y;

could be written as

class Foo:
    @classmethod
    def some_method(cls, x, y):
        return x + y;
Loading history...
266
267
        """ The index to use. """
268
269
        return None
270
271
class FusionFingerprinter(Fingerprinter):
0 ignored issues
show
Coding Style introduced by
This class should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
272
273
    def __init__(self, fingerprinters):
0 ignored issues
show
Bug introduced by
The __init__ method of the super-class Fingerprinter is not called.

It is generally advisable to initialize the super-class by calling its __init__ method:

class SomeParent:
    def __init__(self):
        self.x = 1

class SomeChild(SomeParent):
    def __init__(self):
        # Initialize the super class
        SomeParent.__init__(self)
Loading history...
274
275
        self.fingerprinters = fingerprinters
276
277
    def transform(self, obj):
278
279
        if isinstance(obj, skchem.Mol):
280
            return pd.concat([fp.transform(obj) for fp in self.fingerprinters],
281
                             keys=[fp.NAME for fp in self.fingerprinters])
282
283
        elif isinstance(obj, pd.DataFrame):
284
            return pd.concat([fp.transform(obj) for fp in self.fingerprinters],
285
                             keys=[fp.NAME for fp in self.fingerprinters],
286
                             axis=1)
287
288
        elif isinstance(obj, pd.Series):
289
            return pd.concat([fp.transform(obj.structure) \
290
                                for fp in self.fingerprinters],
291
                             keys=[fp.NAME for fp in self.fingerprinters],
292
                             axis=1)
293
294
        else:
295
            raise NotImplementedError
296
297
    def _transform(self, mol):
298
299
        return pd.concat([fp.transform(mol) for fp in self.fingerprinters])
300
301
class MorganFingerprinter(Fingerprinter):
302
303
    """ Morgan Fingerprint Transformer. """
304
305
    NAME = 'morgan'
306
307
    def __init__(self, radius=2, n_feats=2048, as_bits=True,
0 ignored issues
show
Bug introduced by
The __init__ method of the super-class Fingerprinter is not called.

It is generally advisable to initialize the super-class by calling its __init__ method:

class SomeParent:
    def __init__(self):
        self.x = 1

class SomeChild(SomeParent):
    def __init__(self):
        # Initialize the super class
        SomeParent.__init__(self)
Loading history...
best-practice introduced by
Too many arguments (7/5)
Loading history...
308
                 use_features=False, use_bond_types=True, use_chirality=False):
309
310
        """
311
        Args:
312
            radius (int):
313
                The maximum radius for atom environments.
314
                Default is `2`.
315
            n_feats (int):
316
                The number of features to which to fold the fingerprint down.
317
                For unfolded, use `-1`.
318
                Default is `2048`.
319
            as_bits (bool):
320
                Whether to return bits (`True`) or counts (`False`).
321
                Default is `True`.
322
            use_features (bool):
323
                Whether to use map atom types to generic features (FCFP analog).
324
                Default is `False`.
325
            use_bond_types (bool):
326
                Whether to use bond types to differentiate environments.
327
                Default is `False`.
328
            use_chirality (bool):
329
                Whether to use chirality to differentiate environments.
330
                Default is `False`.
331
332
        Notes:
333
            Currently, folded bits are by far the fastest implementation.
334
        """
335
336
        self.radius = radius
337
        self.n_feats = n_feats
338
        self.sparse = self.n_feats < 0
339
        self.as_bits = as_bits
340
        self.use_features = use_features
341
        self.use_bond_types = use_bond_types
342
        self.use_chirality = use_chirality
343
344 View Code Duplication
    def _transform(self, mol):
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
345
346
        """Private method to transform a skchem molecule.
347
348
        Use `transform` for the public method, which genericizes the argument to
349
        iterables of mols.
350
351
        Args:
352
            mol (skchem.Mol): Molecule to calculate fingerprint for.
353
354
        Returns:
355
            np.array or dict:
356
                Fingerprint as an array (or a dict if sparse).
357
        """
358
359
        if self.as_bits and self.n_feats > 0:
360
361
            fp = GetMorganFingerprintAsBitVect(mol, self.radius,
0 ignored issues
show
Coding Style Naming introduced by
The name fp does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
362
                                           useFeatures=self.use_features,
0 ignored issues
show
Coding Style introduced by
Wrong continued indentation.
useFeatures=self.use_features,
^ |
Loading history...
363
                                           useBondTypes=self.use_bond_types,
0 ignored issues
show
Coding Style introduced by
Wrong continued indentation.
useBondTypes=self.use_bond_types,
^ |
Loading history...
364
                                           useChirality=self.use_chirality)
0 ignored issues
show
Coding Style introduced by
Wrong continued indentation.
useChirality=self.use_chirality)
^ |
Loading history...
365
            res = np.array(0)
366
            ConvertToNumpyArray(fp, res).astype(np.uint8)
367
368
        else:
369
370
            if self.n_feats <= 0:
371
372
                res = GetMorganFingerprint(mol, self.radius,
373
                                           useFeatures=self.use_features,
374
                                           useBondTypes=self.use_bond_types,
375
                                           useChirality=self.use_chirality)
376
                res = res.GetNonzeroElements()
377
                if self.as_bits:
378
                    res = {k: int(v > 0) for k, v in res.items()}
379
380
            else:
381
                res = GetHashedMorganFingerprint(mol, self.radius,
382
                                                 nBits=self.n_feats,
383
                                                 useFeatures=self.use_features,
384
                                                 useBondTypes=self.use_bond_types,
385
                                                 useChirality=self.use_chirality)
386
                res = np.array(list(res))
387
388
389
390
        return res
391
392
    def grad(self, mol):
393
394
        """ Calculate the pseudo gradient with resepect to the atoms.
395
396
        The pseudo gradient is the number of times the atom set that particular
397
        bit.
398
399
        Args:
400
            mol (skchem.Mol):
401
                The molecule for which to calculate the pseudo gradient.
402
403
        Returns:
404
            pandas.DataFrame:
405
                Dataframe of pseudogradients, with columns corresponding to
406
                atoms, and rows corresponding to features of the fingerprint.
407
        """
408
409
        cols = pd.Index(list(range(len(mol.atoms))), name='atoms')
410
        dist = GetDistanceMatrix(mol)
411
412
        info = {}
413
414
        if self.n_feats < 0:
415
416
            res = GetMorganFingerprint(mol, self.radius,
417
                                       useFeatures=self.use_features,
418
                                       useBondTypes=self.use_bond_types,
419
                                       useChirality=self.use_chirality,
420
                                       bitInfo=info).GetNonzeroElements()
421
            idx_list = list(res.keys())
422
            idx = pd.Index(idx_list, name='features')
423
            grad = np.zeros((len(idx), len(cols)))
424
            for bit in info:
425
                for atom_idx, radius in info[bit]:
426
                    grad[idx_list.index(bit)] += (dist <= radius)[atom_idx]
427
428
        else:
429
430
            res = list(GetHashedMorganFingerprint(mol, self.radius,
431
                                        nBits=self.n_feats,
0 ignored issues
show
Coding Style introduced by
Wrong continued indentation.
nBits=self.n_feats,
^ |
Loading history...
432
                                        useFeatures=self.use_features,
0 ignored issues
show
Coding Style introduced by
Wrong continued indentation.
useFeatures=self.use_features,
^ |
Loading history...
433
                                        useBondTypes=self.use_bond_types,
0 ignored issues
show
Coding Style introduced by
Wrong continued indentation.
useBondTypes=self.use_bond_types,
^ |
Loading history...
434
                                        useChirality=self.use_chirality,
0 ignored issues
show
Coding Style introduced by
Wrong continued indentation.
useChirality=self.use_chirality,
^ |
Loading history...
435
                                        bitInfo=info))
0 ignored issues
show
Coding Style introduced by
Wrong continued indentation.
bitInfo=info))
^ |
Loading history...
436
            idx = pd.Index(range(self.n_feats), name='features')
437
            grad = np.zeros((len(idx), len(cols)))
438
439
            for bit in info:
440
                for atom_idx, radius in info[bit]:
441
                    grad[bit] += (dist <= radius)[atom_idx]
442
443
        grad = pd.DataFrame(grad, index=idx, columns=cols)
444
445
        if self.as_bits:
446
            grad = (grad > 0)
447
448
        return grad.astype(int)
449
450
class AtomPairFingerprinter(Fingerprinter):
451
452
    """ Atom Pair Tranformer. """
453
454
    NAME = 'atom_pair'
455
456
    def __init__(self, min_length=1, max_length=30, n_feats=2048, as_bits=False,
0 ignored issues
show
Bug introduced by
The __init__ method of the super-class Fingerprinter is not called.

It is generally advisable to initialize the super-class by calling its __init__ method:

class SomeParent:
    def __init__(self):
        self.x = 1

class SomeChild(SomeParent):
    def __init__(self):
        # Initialize the super class
        SomeParent.__init__(self)
Loading history...
best-practice introduced by
Too many arguments (6/5)
Loading history...
457
                 use_chirality=False):
458
459
        """ Instantiate an atom pair fingerprinter.
460
461
        Args:
462
            min_length (int):
463
                The minimum length of paths between pairs.
464
                Default is `1`, i.e. pairs can be bonded together.
465
            max_length (int):
466
                The maximum length of paths between pairs.
467
                Default is `30`.
468
            n_feats (int):
469
                The number of features to which to fold the fingerprint down.
470
                For unfolded, use `-1`.
471
                Default is `2048`.
472
            as_bits (bool):
473
                Whether to return bits (`True`) or counts (`False`).
474
                Default is `False`.
475
            use_chirality (bool):
476
                Whether to use chirality to differentiate environments.
477
                Default is `False`.
478
        """
479
480
        self.min_length = min_length
481
        self.max_length = max_length
482
        self.n_feats = n_feats
483
        self.sparse = self.n_feats < 0
484
        self.as_bits = as_bits
485
        self.use_chirality = use_chirality
486
487 View Code Duplication
    def _transform(self, mol):
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
488
489
        """Private method to transform a skchem molecule.
490
491
        Use transform` for the public method, which genericizes the argument to
492
        iterables of mols.
493
494
        Args:
495
            mol (skchem.Mol): Molecule to calculate fingerprint for.
496
497
        Returns:
498
            np.array or dict:
499
                Fingerprint as an array (or a dict if sparse).
500
        """
501
502
503
        if self.as_bits and self.n_feats > 0:
504
505
            fp = GetHashedAtomPairFingerprintAsBitVect(mol, nBits=self.n_feats,
0 ignored issues
show
Coding Style Naming introduced by
The name fp does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
506
                                           minLength=self.min_length,
0 ignored issues
show
Coding Style introduced by
Wrong continued indentation.
minLength=self.min_length,
^ |
Loading history...
507
                                           maxLength=self.max_length,
0 ignored issues
show
Coding Style introduced by
Wrong continued indentation.
maxLength=self.max_length,
^ |
Loading history...
508
                                           includeChirality=self.use_chirality)
0 ignored issues
show
Coding Style introduced by
Wrong continued indentation.
includeChirality=self.use_chirality)
^ |
Loading history...
509
            res = np.array(0)
510
            ConvertToNumpyArray(fp, res).astype(np.uint8)
511
512
        else:
513
514
            if self.n_feats <= 0:
515
516
                res = GetAtomPairFingerprint(mol, nBits=self.n_feats,
517
                                               minLength=self.min_length,
0 ignored issues
show
Coding Style introduced by
Wrong continued indentation.
minLength=self.min_length,
| ^
Loading history...
518
                                               maxLength=self.max_length,
0 ignored issues
show
Coding Style introduced by
Wrong continued indentation.
maxLength=self.max_length,
| ^
Loading history...
519
                                               includeChirality=self.use_chirality)
0 ignored issues
show
Coding Style introduced by
Wrong continued indentation.
includeChirality=self.use_chirality)
| ^
Loading history...
520
                res = res.GetNonzeroElements()
521
                if self.as_bits:
522
                    res = {k: int(v > 0) for k, v in res.items()}
523
524
            else:
525
                res = GetHashedAtomPairFingerprint(mol, nBits=self.n_feats,
526
                                               minLength=self.min_length,
0 ignored issues
show
Coding Style introduced by
Wrong continued indentation.
minLength=self.min_length,
^ |
Loading history...
527
                                               maxLength=self.max_length,
0 ignored issues
show
Coding Style introduced by
Wrong continued indentation.
maxLength=self.max_length,
^ |
Loading history...
528
                                               includeChirality=self.use_chirality)
0 ignored issues
show
Coding Style introduced by
Wrong continued indentation.
includeChirality=self.use_chirality)
^ |
Loading history...
529
                res = np.array(list(res))
530
531
        return res
532
533
class TopologicalTorsionFingerprinter(Fingerprinter):
0 ignored issues
show
Coding Style introduced by
This class should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
534
535
    NAME = 'topological_torsion'
536
537
    def __init__(self, target_size=4, n_feats=2048, as_bits=False,
0 ignored issues
show
Bug introduced by
The __init__ method of the super-class Fingerprinter is not called.

It is generally advisable to initialize the super-class by calling its __init__ method:

class SomeParent:
    def __init__(self):
        self.x = 1

class SomeChild(SomeParent):
    def __init__(self):
        # Initialize the super class
        SomeParent.__init__(self)
Loading history...
538
                 use_chirality=False):
539
540
        """
541
        Args:
542
            target_size (int):
543
                # TODO
544
            n_feats (int):
545
                The number of features to which to fold the fingerprint down.
546
                For unfolded, use `-1`.
547
                Default is `2048`.
548
            as_bits (bool):
549
                Whether to return bits (`True`) or counts (`False`).
550
                Default is `False`.
551
            use_chirality (bool):
552
                Whether to use chirality to differentiate environments.
553
                Default is `False`.
554
        """
555
556
        self.target_size = target_size
557
        self.n_feats = n_feats
558
        self.sparse = self.n_feats < 0
559
        self.as_bits = as_bits
560
        self.use_chirality = use_chirality
561
562
    def _transform(self, mol):
563
        """ Private method to transform a skchem molecule.
564
        Args:
565
            mol (skchem.Mol): Molecule to calculate fingerprint for.
566
567
        Returns:
568
            np.array or dict:
569
                Fingerprint as an array (or a dict if sparse).
570
        """
571
572
        if self.as_bits and self.n_feats > 0:
573
574
            fp = GetHashedTopologicalTorsionFingerprintAsBitVect(mol, nBits=self.n_feats,
0 ignored issues
show
Coding Style Naming introduced by
The name fp does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
575
                                           targetSize=self.target_size,
0 ignored issues
show
Coding Style introduced by
Wrong continued indentation.
targetSize=self.target_size,
^ |
Loading history...
576
                                           includeChirality=self.use_chirality)
0 ignored issues
show
Coding Style introduced by
Wrong continued indentation.
includeChirality=self.use_chirality)
^ |
Loading history...
577
            res = np.array(0)
578
            ConvertToNumpyArray(fp, res).astype(np.uint8)
579
580
        else:
581
582
            if self.n_feats <= 0:
583
584
                res = GetTopologicalTorsionFingerprint(mol, nBits=self.n_feats,
585
                                               targetSize=self.target_size,
0 ignored issues
show
Coding Style introduced by
Wrong continued indentation.
targetSize=self.target_size,
^ |
Loading history...
586
                                               includeChirality=self.use_chirality)
0 ignored issues
show
Coding Style introduced by
Wrong continued indentation.
includeChirality=self.use_chirality)
^ |
Loading history...
587
                res = res.GetNonzeroElements()
588
                if self.as_bits:
589
                    res = {k: int(v > 0) for k, v in res.items()}
590
591
            else:
592
                res = GetHashedTopologicalTorsionFingerprint(mol, nBits=self.n_feats,
593
                                               targetSize=self.target_size,
0 ignored issues
show
Coding Style introduced by
Wrong continued indentation.
targetSize=self.target_size,
^ |
Loading history...
594
                                               includeChirality=self.use_chirality)
0 ignored issues
show
Coding Style introduced by
Wrong continued indentation.
includeChirality=self.use_chirality)
^ |
Loading history...
595
                res = np.array(list(res))
596
597
        return res
598
599
600
class MACCSKeysFingerprinter(Fingerprinter):
601
602
    """ MACCS Keys Fingerprints """
603
604
    NAME = 'maccs'
605
606
    def __init__(self):
0 ignored issues
show
Bug introduced by
The __init__ method of the super-class Fingerprinter is not called.

It is generally advisable to initialize the super-class by calling its __init__ method:

class SomeParent:
    def __init__(self):
        self.x = 1

class SomeChild(SomeParent):
    def __init__(self):
        # Initialize the super class
        SomeParent.__init__(self)
Loading history...
607
        self.sparse = False
608
609
    def _transform(self, mol):
610
611
        return np.array(list(GetMACCSKeysFingerprint(mol)))
612
613
class ErGFingerprinter(Fingerprinter):
614
615
    """ ErG Fingerprints """
616
617
    NAME = 'erg'
618
619
    def __init__(self):
0 ignored issues
show
Bug introduced by
The __init__ method of the super-class Fingerprinter is not called.

It is generally advisable to initialize the super-class by calling its __init__ method:

class SomeParent:
    def __init__(self):
        self.x = 1

class SomeChild(SomeParent):
    def __init__(self):
        # Initialize the super class
        SomeParent.__init__(self)
Loading history...
620
        self.sparse = False
621
622
    def _transform(self, mol):
623
624
        return np.array(GetErGFingerprint(mol))
625
626
class FeatureInvariantsFingerprinter(Fingerprinter):
627
628
    """ Feature invariants fingerprints. """
629
630
    NAME = 'feat_inv'
631
632
    def __init__(self):
0 ignored issues
show
Bug introduced by
The __init__ method of the super-class Fingerprinter is not called.

It is generally advisable to initialize the super-class by calling its __init__ method:

class SomeParent:
    def __init__(self):
        self.x = 1

class SomeChild(SomeParent):
    def __init__(self):
        # Initialize the super class
        SomeParent.__init__(self)
Loading history...
633
        self.sparse = False
634
635
    def _transform(self, mol):
636
637
        return np.array(GetFeatureInvariants(mol))
638
639
class ConnectivityInvariantsFingerprinter(Fingerprinter):
640
641
    """ Connectivity invariants fingerprints """
642
643
    NAME = 'conn_inv'
644
645
    def __init__(self):
0 ignored issues
show
Bug introduced by
The __init__ method of the super-class Fingerprinter is not called.

It is generally advisable to initialize the super-class by calling its __init__ method:

class SomeParent:
    def __init__(self):
        self.x = 1

class SomeChild(SomeParent):
    def __init__(self):
        # Initialize the super class
        SomeParent.__init__(self)
Loading history...
646
        self.sparse = False
647
648
    def _transform(self, mol):
649
650
        return np.array(GetConnectivityInvariants(mol))
651
652
class RDKFingerprinter(Fingerprinter):
0 ignored issues
show
best-practice introduced by
Too many instance attributes (10/7)
Loading history...
653
654
    """ RDKit fingerprint """
655
656
    NAME = 'rdk'
657
658
    def __init__(self, min_path=1, max_path=7, n_feats=2048, n_bits_per_hash=2,
0 ignored issues
show
Bug introduced by
The __init__ method of the super-class Fingerprinter is not called.

It is generally advisable to initialize the super-class by calling its __init__ method:

class SomeParent:
    def __init__(self):
        self.x = 1

class SomeChild(SomeParent):
    def __init__(self):
        # Initialize the super class
        SomeParent.__init__(self)
Loading history...
best-practice introduced by
Too many arguments (10/5)
Loading history...
Unused Code introduced by
The argument n_feats seems to be unused.
Loading history...
Unused Code introduced by
The argument min_path seems to be unused.
Loading history...
Unused Code introduced by
The argument n_bits_per_hash seems to be unused.
Loading history...
Unused Code introduced by
The argument max_path seems to be unused.
Loading history...
659
                 use_hs=True, target_density=0.0, min_size=128,
0 ignored issues
show
Unused Code introduced by
The argument use_hs seems to be unused.
Loading history...
Unused Code introduced by
The argument target_density seems to be unused.
Loading history...
Unused Code introduced by
The argument min_size seems to be unused.
Loading history...
660
                 branched_paths=True, use_bond_types=True):
0 ignored issues
show
Unused Code introduced by
The argument use_bond_types seems to be unused.
Loading history...
Unused Code introduced by
The argument branched_paths seems to be unused.
Loading history...
661
662
        """ RDK fingerprints
663
664
        Args:
665
            min_path (int):
666
667
            max_path (int):
668
669
            n_feats (int):
670
                The number of features to which to fold the fingerprint down.
671
                For unfolded, use `-1`.
672
                Default is `2048`.
673
674
            n_bits_per_hash (int)
675
676
            use_hs (bool):
677
678
            target_density (float):
679
680
            min_size (int):
681
682
            branched_paths (bool):
683
684
            use_bond_types (bool):
685
        """
686
687
        self.min_path = 1
688
        self.max_path = 7
689
        self.n_feats = 2048
690
        self.sparse = False
691
        self.n_bits_per_hash = 2
692
        self.use_hs = True
693
        self.target_density = 0.0
694
        self.min_size = 128
695
        self.branched_paths = True
696
        self.use_bond_types = True
697
698
    def _transform(self, mol):
699
700
        return np.array(list(RDKFingerprint(mol, minPath=self.min_path,
701
                                             maxPath=self.max_path,
0 ignored issues
show
Coding Style introduced by
Wrong continued indentation.
maxPath=self.max_path,
|^
Loading history...
702
                                             fpSize=self.n_feats,
0 ignored issues
show
Coding Style introduced by
Wrong continued indentation.
fpSize=self.n_feats,
|^
Loading history...
703
                                             nBitsPerHash=self.n_bits_per_hash,
0 ignored issues
show
Coding Style introduced by
Wrong continued indentation.
nBitsPerHash=self.n_bits_per_hash,
|^
Loading history...
704
                                             useHs=self.use_hs,
0 ignored issues
show
Coding Style introduced by
Wrong continued indentation.
useHs=self.use_hs,
|^
Loading history...
705
                                             tgtDensity=self.target_density,
0 ignored issues
show
Coding Style introduced by
Wrong continued indentation.
tgtDensity=self.target_density,
|^
Loading history...
706
                                             minSize=self.min_size,
0 ignored issues
show
Coding Style introduced by
Wrong continued indentation.
minSize=self.min_size,
|^
Loading history...
707
                                             branchedPaths=self.branched_paths,
0 ignored issues
show
Coding Style introduced by
Wrong continued indentation.
branchedPaths=self.branched_paths,
|^
Loading history...
708
                                             useBondOrder=self.use_bond_types)),
0 ignored issues
show
Coding Style introduced by
Wrong continued indentation.
useBondOrder=self.use_bond_types)),
|^
Loading history...
709
                        name=mol.name)
710