1 | #! /usr/bin/env python |
||
2 | # |
||
3 | # Copyright (C) 2007-2009 Rich Lewis <[email protected]> |
||
4 | # License: 3-clause BSD |
||
5 | |||
6 | 1 | """ |
|
7 | ## skchem.descriptors.fingerprints |
||
8 | |||
9 | Fingerprinting classes and associated functions are defined. |
||
10 | """ |
||
11 | |||
12 | 1 | import pandas as pd |
|
0 ignored issues
–
show
|
|||
13 | 1 | from rdkit.Chem import GetDistanceMatrix |
|
0 ignored issues
–
show
The import
rdkit.Chem could not be resolved.
This can be caused by one of the following: 1. Missing DependenciesThis error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands. # .scrutinizer.yml
before_commands:
- sudo pip install abc # Python2
- sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use
the command for the correct version.
2. Missing __init__.py filesThis error could also result from missing ![]() |
|||
14 | 1 | from rdkit.DataStructs import ConvertToNumpyArray |
|
0 ignored issues
–
show
The import
rdkit.DataStructs could not be resolved.
This can be caused by one of the following: 1. Missing DependenciesThis error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands. # .scrutinizer.yml
before_commands:
- sudo pip install abc # Python2
- sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use
the command for the correct version.
2. Missing __init__.py filesThis error could also result from missing ![]() |
|||
15 | 1 | from rdkit.Chem.rdMolDescriptors import ( |
|
0 ignored issues
–
show
The import
rdkit.Chem.rdMolDescriptors could not be resolved.
This can be caused by one of the following: 1. Missing DependenciesThis error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands. # .scrutinizer.yml
before_commands:
- sudo pip install abc # Python2
- sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use
the command for the correct version.
2. Missing __init__.py filesThis error could also result from missing ![]() |
|||
16 | GetMorganFingerprint, |
||
17 | GetHashedMorganFingerprint, |
||
18 | GetMorganFingerprintAsBitVect, |
||
19 | GetAtomPairFingerprint, |
||
20 | GetHashedAtomPairFingerprint, |
||
21 | GetHashedAtomPairFingerprintAsBitVect, |
||
22 | GetTopologicalTorsionFingerprint, |
||
23 | GetHashedTopologicalTorsionFingerprint, |
||
24 | GetHashedTopologicalTorsionFingerprintAsBitVect, |
||
25 | GetMACCSKeysFingerprint, |
||
26 | GetFeatureInvariants, |
||
27 | GetConnectivityInvariants) |
||
28 | 1 | from rdkit.Chem.rdReducedGraphs import GetErGFingerprint |
|
0 ignored issues
–
show
The import
rdkit.Chem.rdReducedGraphs could not be resolved.
This can be caused by one of the following: 1. Missing DependenciesThis error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands. # .scrutinizer.yml
before_commands:
- sudo pip install abc # Python2
- sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use
the command for the correct version.
2. Missing __init__.py filesThis error could also result from missing ![]() |
|||
29 | 1 | from rdkit.Chem.rdmolops import RDKFingerprint |
|
0 ignored issues
–
show
The import
rdkit.Chem.rdmolops could not be resolved.
This can be caused by one of the following: 1. Missing DependenciesThis error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands. # .scrutinizer.yml
before_commands:
- sudo pip install abc # Python2
- sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use
the command for the correct version.
2. Missing __init__.py filesThis error could also result from missing ![]() |
|||
30 | |||
31 | 1 | import numpy as np |
|
0 ignored issues
–
show
The import
numpy could not be resolved.
This can be caused by one of the following: 1. Missing DependenciesThis error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands. # .scrutinizer.yml
before_commands:
- sudo pip install abc # Python2
- sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use
the command for the correct version.
2. Missing __init__.py filesThis error could also result from missing ![]() |
|||
32 | 1 | from ..base import Transformer, Featurizer |
|
33 | |||
34 | |||
35 | 1 | class MorganFeaturizer(Transformer, Featurizer): |
|
0 ignored issues
–
show
|
|||
36 | """ Morgan fingerprints, implemented by RDKit. |
||
37 | |||
38 | Notes: |
||
39 | |||
40 | Currently, folded bits are by far the fastest implementation. |
||
41 | |||
42 | Due to the speed of calculation, it is unlikely to see a speedup using |
||
43 | the current parallel code, as more time is spent moving data across |
||
44 | processes than for calculating in a single process. |
||
45 | |||
46 | Examples: |
||
47 | |||
48 | >>> import skchem |
||
49 | >>> import pandas as pd |
||
50 | >>> pd.options.display.max_rows = pd.options.display.max_columns = 5 |
||
51 | |||
52 | >>> mf = skchem.features.MorganFeaturizer() |
||
53 | >>> m = skchem.Mol.from_smiles('CCC') |
||
54 | |||
55 | Can transform an individual molecule to yield a Series: |
||
56 | |||
57 | >>> mf.transform(m) |
||
58 | morgan_fp_idx |
||
59 | 0 0 |
||
60 | 1 0 |
||
61 | .. |
||
62 | 2046 0 |
||
63 | 2047 0 |
||
64 | Name: MorganFeaturizer, dtype: uint8 |
||
65 | |||
66 | Can transform a list of molecules to yield a DataFrame: |
||
67 | |||
68 | >>> mf.transform([m]) |
||
69 | morgan_fp_idx 0 1 ... 2046 2047 |
||
70 | 0 0 0 ... 0 0 |
||
71 | <BLANKLINE> |
||
72 | [1 rows x 2048 columns] |
||
73 | |||
74 | Change the number of features the fingerprint is folded down to using |
||
75 | `n_feats`. |
||
76 | |||
77 | >>> mf.n_feats = 1024 |
||
78 | >>> mf.transform(m) |
||
79 | morgan_fp_idx |
||
80 | 0 0 |
||
81 | 1 0 |
||
82 | .. |
||
83 | 1022 0 |
||
84 | 1023 0 |
||
85 | Name: MorganFeaturizer, dtype: uint8 |
||
86 | |||
87 | Count fingerprints with `as_bits` = False |
||
88 | |||
89 | >>> mf.as_bits = False |
||
90 | >>> res = mf.transform(m); res[res > 0] |
||
91 | morgan_fp_idx |
||
92 | 33 2 |
||
93 | 80 1 |
||
94 | 294 2 |
||
95 | 320 1 |
||
96 | Name: MorganFeaturizer, dtype: int64 |
||
97 | |||
98 | Pseudo-gradient with `grad` shows which atoms contributed to which |
||
99 | feature. |
||
100 | |||
101 | >>> mf.grad(m)[res > 0] |
||
102 | atom_idx 0 1 2 |
||
103 | features |
||
104 | 33 1 0 1 |
||
105 | 80 0 1 0 |
||
106 | 294 1 2 1 |
||
107 | 320 1 1 1 |
||
108 | |||
109 | """ |
||
110 | 1 | def __init__(self, radius=2, n_feats=2048, as_bits=True, |
|
0 ignored issues
–
show
|
|||
111 | use_features=False, use_bond_types=True, use_chirality=False, |
||
112 | n_jobs=1, verbose=True): |
||
113 | |||
114 | """ Initialize the fingerprinter object. |
||
115 | |||
116 | Args: |
||
117 | radius (int): |
||
118 | The maximum radius for atom environments. |
||
119 | Default is `2`. |
||
120 | |||
121 | n_feats (int): |
||
122 | The number of features to which to fold the fingerprint down. |
||
123 | For unfolded, use `-1`. |
||
124 | Default is `2048`. |
||
125 | |||
126 | as_bits (bool): |
||
127 | Whether to return bits (`True`) or counts (`False`). |
||
128 | Default is `True`. |
||
129 | |||
130 | use_features (bool): |
||
131 | Whether to use map atom types to generic features (FCFP). |
||
132 | Default is `False`. |
||
133 | |||
134 | use_bond_types (bool): |
||
135 | Whether to use bond types to differentiate environments. |
||
136 | Default is `False`. |
||
137 | |||
138 | use_chirality (bool): |
||
139 | Whether to use chirality to differentiate environments. |
||
140 | Default is `False`. |
||
141 | |||
142 | n_jobs (int): |
||
143 | The number of processes to run the featurizer in. |
||
144 | |||
145 | verbose (bool): |
||
146 | Whether to output a progress bar. |
||
147 | |||
148 | """ |
||
149 | |||
150 | 1 | super(MorganFeaturizer, self).__init__(n_jobs=n_jobs, verbose=verbose) |
|
151 | 1 | self.radius = radius |
|
152 | 1 | self.n_feats = n_feats |
|
153 | 1 | self.sparse = self.n_feats < 0 |
|
154 | 1 | self.as_bits = as_bits |
|
155 | 1 | self.use_features = use_features |
|
156 | 1 | self.use_bond_types = use_bond_types |
|
157 | 1 | self.use_chirality = use_chirality |
|
158 | |||
159 | 1 | View Code Duplication | def _transform_mol(self, mol): |
0 ignored issues
–
show
|
|||
160 | |||
161 | """Private method to transform a skchem molecule. |
||
162 | |||
163 | Use `transform` for the public method, which genericizes the argument |
||
164 | to iterables of mols. |
||
165 | |||
166 | Args: |
||
167 | mol (skchem.Mol): Molecule to calculate fingerprint for. |
||
168 | |||
169 | Returns: |
||
170 | np.array or dict: |
||
171 | Fingerprint as an array (or a dict if sparse). |
||
172 | """ |
||
173 | |||
174 | 1 | if self.as_bits and self.n_feats > 0: |
|
175 | |||
176 | 1 | fp = GetMorganFingerprintAsBitVect( |
|
0 ignored issues
–
show
The name
fp does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$ ).
This check looks for invalid names for a range of different identifiers. You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements. If your project includes a Pylint configuration file, the settings contained in that file take precedence. To find out more about Pylint, please refer to their site. ![]() |
|||
177 | mol, self.radius, nBits=self.n_feats, |
||
178 | useFeatures=self.use_features, |
||
179 | useBondTypes=self.use_bond_types, |
||
180 | useChirality=self.use_chirality) |
||
181 | |||
182 | 1 | res = np.array(0) |
|
183 | 1 | ConvertToNumpyArray(fp, res) |
|
184 | 1 | res = res.astype(np.uint8) |
|
185 | |||
186 | else: |
||
187 | |||
188 | 1 | if self.n_feats <= 0: |
|
189 | |||
190 | res = GetMorganFingerprint( |
||
191 | mol, self.radius, |
||
192 | useFeatures=self.use_features, |
||
193 | useBondTypes=self.use_bond_types, |
||
194 | useChirality=self.use_chirality) |
||
195 | |||
196 | res = res.GetNonzeroElements() |
||
197 | if self.as_bits: |
||
198 | res = {k: int(v > 0) for k, v in res.items()} |
||
199 | |||
200 | else: |
||
201 | 1 | res = GetHashedMorganFingerprint( |
|
202 | mol, self.radius, nBits=self.n_feats, |
||
203 | useFeatures=self.use_features, |
||
204 | useBondTypes=self.use_bond_types, |
||
205 | useChirality=self.use_chirality) |
||
206 | |||
207 | 1 | res = np.array(list(res)) |
|
208 | |||
209 | 1 | return res |
|
210 | |||
211 | 1 | @property |
|
212 | def name(self): |
||
0 ignored issues
–
show
This method should have a docstring.
The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods: class SomeClass:
def some_method(self):
"""Do x and return foo."""
If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions. ![]() This method could be written as a function/class method.
If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example class Foo:
def some_method(self, x, y):
return x + y;
could be written as class Foo:
@classmethod
def some_method(cls, x, y):
return x + y;
![]() |
|||
213 | return 'morg' |
||
214 | |||
215 | 1 | @property |
|
216 | def columns(self): |
||
217 | 1 | return pd.RangeIndex(self.n_feats, name='morgan_fp_idx') |
|
218 | |||
219 | 1 | def grad(self, mol): |
|
220 | |||
221 | """ Calculate the pseudo gradient with respect to the atoms. |
||
222 | |||
223 | The pseudo gradient is the number of times the atom set that particular |
||
224 | bit. |
||
225 | |||
226 | Args: |
||
227 | mol (skchem.Mol): |
||
228 | The molecule for which to calculate the pseudo gradient. |
||
229 | |||
230 | Returns: |
||
231 | pandas.DataFrame: |
||
232 | Dataframe of pseudogradients, with columns corresponding to |
||
233 | atoms, and rows corresponding to features of the fingerprint. |
||
234 | """ |
||
235 | |||
236 | 1 | cols = pd.Index(list(range(len(mol.atoms))), name='atom_idx') |
|
237 | 1 | dist = GetDistanceMatrix(mol) |
|
238 | |||
239 | 1 | info = {} |
|
240 | |||
241 | 1 | if self.n_feats < 0: |
|
242 | |||
243 | res = GetMorganFingerprint(mol, self.radius, |
||
244 | useFeatures=self.use_features, |
||
245 | useBondTypes=self.use_bond_types, |
||
246 | useChirality=self.use_chirality, |
||
247 | bitInfo=info).GetNonzeroElements() |
||
248 | idx_list = list(res.keys()) |
||
249 | idx = pd.Index(idx_list, name='features') |
||
250 | grad = np.zeros((len(idx), len(cols))) |
||
251 | for bit in info: |
||
252 | for atom_idx, radius in info[bit]: |
||
253 | grad[idx_list.index(bit)] += (dist <= radius)[atom_idx] |
||
254 | |||
255 | else: |
||
256 | |||
257 | 1 | GetHashedMorganFingerprint(mol, self.radius, nBits=self.n_feats, |
|
258 | useFeatures=self.use_features, |
||
259 | useBondTypes=self.use_bond_types, |
||
260 | useChirality=self.use_chirality, |
||
261 | bitInfo=info) |
||
262 | |||
263 | 1 | idx = pd.Index(range(self.n_feats), name='features') |
|
264 | 1 | grad = np.zeros((len(idx), len(cols))) |
|
265 | |||
266 | 1 | for bit in info: |
|
267 | 1 | for atom_idx, radius in info[bit]: |
|
268 | 1 | grad[bit] += (dist <= radius)[atom_idx] |
|
269 | |||
270 | 1 | grad = pd.DataFrame(grad, index=idx, columns=cols) |
|
271 | |||
272 | 1 | if self.as_bits: |
|
273 | grad = (grad > 0) |
||
274 | |||
275 | 1 | return grad.astype(int) |
|
276 | |||
277 | |||
278 | 1 | class AtomPairFeaturizer(Transformer, Featurizer): |
|
279 | |||
280 | """ Atom Pair Fingerprints, implemented by RDKit. """ |
||
281 | |||
282 | 1 | def __init__(self, min_length=1, max_length=30, n_feats=2048, |
|
0 ignored issues
–
show
|
|||
283 | as_bits=False, use_chirality=False, n_jobs=1, verbose=True): |
||
284 | |||
285 | """ Instantiate an atom pair fingerprinter. |
||
286 | |||
287 | Args: |
||
288 | min_length (int): |
||
289 | The minimum length of paths between pairs. |
||
290 | Default is `1`, i.e. pairs can be bonded together. |
||
291 | |||
292 | max_length (int): |
||
293 | The maximum length of paths between pairs. |
||
294 | Default is `30`. |
||
295 | |||
296 | n_feats (int): |
||
297 | The number of features to which to fold the fingerprint down. |
||
298 | For unfolded, use `-1`. |
||
299 | Default is `2048`. |
||
300 | |||
301 | as_bits (bool): |
||
302 | Whether to return bits (`True`) or counts (`False`). |
||
303 | Default is `False`. |
||
304 | |||
305 | use_chirality (bool): |
||
306 | Whether to use chirality to differentiate environments. |
||
307 | Default is `False`. |
||
308 | |||
309 | n_jobs (int): |
||
310 | The number of processes to run the featurizer in. |
||
311 | |||
312 | verbose (bool): |
||
313 | Whether to output a progress bar. |
||
314 | """ |
||
315 | |||
316 | super(AtomPairFeaturizer, self).__init__(n_jobs=n_jobs, |
||
317 | verbose=verbose) |
||
318 | self.min_length = min_length |
||
319 | self.max_length = max_length |
||
320 | self.n_feats = n_feats |
||
321 | self.sparse = self.n_feats < 0 |
||
322 | self.as_bits = as_bits |
||
323 | self.use_chirality = use_chirality |
||
324 | |||
325 | 1 | View Code Duplication | def _transform_mol(self, mol): |
0 ignored issues
–
show
|
|||
326 | |||
327 | """Private method to transform a skchem molecule. |
||
328 | |||
329 | Use transform` for the public method, which genericizes the argument to |
||
330 | iterables of mols. |
||
331 | |||
332 | Args: |
||
333 | mol (skchem.Mol): Molecule to calculate fingerprint for. |
||
334 | |||
335 | Returns: |
||
336 | np.array or dict: |
||
337 | Fingerprint as an array (or a dict if sparse). |
||
338 | """ |
||
339 | |||
340 | if self.as_bits and self.n_feats > 0: |
||
341 | |||
342 | fp = GetHashedAtomPairFingerprintAsBitVect( |
||
0 ignored issues
–
show
The name
fp does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$ ).
This check looks for invalid names for a range of different identifiers. You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements. If your project includes a Pylint configuration file, the settings contained in that file take precedence. To find out more about Pylint, please refer to their site. ![]() |
|||
343 | mol, nBits=self.n_feats, minLength=self.min_length, |
||
344 | maxLength=self.max_length, includeChirality=self.use_chirality) |
||
345 | |||
346 | res = np.array(0) |
||
347 | ConvertToNumpyArray(fp, res) |
||
348 | res = res.astype(np.uint8) |
||
349 | |||
350 | else: |
||
351 | |||
352 | if self.n_feats <= 0: |
||
353 | |||
354 | res = GetAtomPairFingerprint( |
||
355 | mol, nBits=self.n_feats, minLength=self.min_length, |
||
356 | maxLength=self.max_length, |
||
357 | includeChirality=self.use_chirality) |
||
358 | |||
359 | res = res.GetNonzeroElements() |
||
360 | if self.as_bits: |
||
361 | res = {k: int(v > 0) for k, v in res.items()} |
||
362 | |||
363 | else: |
||
364 | res = GetHashedAtomPairFingerprint( |
||
365 | mol, nBits=self.n_feats, minLength=self.min_length, |
||
366 | maxLength=self.max_length, |
||
367 | includeChirality=self.use_chirality) |
||
368 | |||
369 | res = np.array(list(res)) |
||
370 | |||
371 | return res |
||
372 | |||
373 | 1 | @property |
|
374 | def name(self): |
||
0 ignored issues
–
show
This method should have a docstring.
The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods: class SomeClass:
def some_method(self):
"""Do x and return foo."""
If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions. ![]() This method could be written as a function/class method.
If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example class Foo:
def some_method(self, x, y):
return x + y;
could be written as class Foo:
@classmethod
def some_method(cls, x, y):
return x + y;
![]() |
|||
375 | return 'atom_pair' |
||
376 | |||
377 | 1 | @property |
|
378 | def columns(self): |
||
379 | return pd.RangeIndex(self.n_feats, name='ap_fp_idx') |
||
380 | |||
381 | |||
382 | 1 | class TopologicalTorsionFeaturizer(Transformer, Featurizer): |
|
383 | |||
384 | """ Topological Torsion fingerprints, implemented by RDKit. """ |
||
385 | |||
386 | 1 | def __init__(self, target_size=4, n_feats=2048, as_bits=False, |
|
0 ignored issues
–
show
|
|||
387 | use_chirality=False, n_jobs=1, verbose=True): |
||
388 | |||
389 | """ Initialize a TopologicalTorsionFeaturizer object. |
||
390 | |||
391 | Args: |
||
392 | target_size (int): |
||
393 | # TODO |
||
394 | |||
395 | n_feats (int): |
||
396 | The number of features to which to fold the fingerprint down. |
||
397 | For unfolded, use `-1`. |
||
398 | Default is `2048`. |
||
399 | |||
400 | as_bits (bool): |
||
401 | Whether to return bits (`True`) or counts (`False`). |
||
402 | Default is `False`. |
||
403 | |||
404 | use_chirality (bool): |
||
405 | Whether to use chirality to differentiate environments. |
||
406 | Default is `False`. |
||
407 | n_jobs (int): |
||
408 | The number of processes to run the featurizer in. |
||
409 | |||
410 | verbose (bool): |
||
411 | Whether to output a progress bar. |
||
412 | """ |
||
413 | |||
414 | self.target_size = target_size |
||
415 | self.n_feats = n_feats |
||
416 | self.sparse = self.n_feats < 0 |
||
417 | self.as_bits = as_bits |
||
418 | self.use_chirality = use_chirality |
||
419 | super(TopologicalTorsionFeaturizer, self).__init__(n_jobs=n_jobs, |
||
420 | verbose=verbose) |
||
421 | |||
422 | 1 | View Code Duplication | def _transform_mol(self, mol): |
0 ignored issues
–
show
|
|||
423 | """ Private method to transform a skchem molecule. |
||
424 | Args: |
||
425 | mol (skchem.Mol): Molecule to calculate fingerprint for. |
||
426 | |||
427 | Returns: |
||
428 | np.array or dict: |
||
429 | Fingerprint as an array (or a dict if sparse). |
||
430 | """ |
||
431 | |||
432 | if self.as_bits and self.n_feats > 0: |
||
433 | |||
434 | fp = GetHashedTopologicalTorsionFingerprintAsBitVect( |
||
0 ignored issues
–
show
The name
fp does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$ ).
This check looks for invalid names for a range of different identifiers. You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements. If your project includes a Pylint configuration file, the settings contained in that file take precedence. To find out more about Pylint, please refer to their site. ![]() |
|||
435 | mol, nBits=self.n_feats, targetSize=self.target_size, |
||
436 | includeChirality=self.use_chirality) |
||
437 | |||
438 | res = np.array(0) |
||
439 | ConvertToNumpyArray(fp, res) |
||
440 | res = res.astype(np.uint8) |
||
441 | |||
442 | else: |
||
443 | |||
444 | if self.n_feats <= 0: |
||
445 | |||
446 | res = GetTopologicalTorsionFingerprint( |
||
447 | mol, nBits=self.n_feats, targetSize=self.target_size, |
||
448 | includeChirality=self.use_chirality) |
||
449 | |||
450 | res = res.GetNonzeroElements() |
||
451 | if self.as_bits: |
||
452 | res = {k: int(v > 0) for k, v in res.items()} |
||
453 | |||
454 | else: |
||
455 | res = GetHashedTopologicalTorsionFingerprint( |
||
456 | mol, nBits=self.n_feats, targetSize=self.target_size, |
||
457 | includeChirality=self.use_chirality) |
||
458 | |||
459 | res = np.array(list(res)) |
||
460 | |||
461 | return res |
||
462 | |||
463 | 1 | @property |
|
464 | def names(self): |
||
0 ignored issues
–
show
This method should have a docstring.
The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods: class SomeClass:
def some_method(self):
"""Do x and return foo."""
If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions. ![]() This method could be written as a function/class method.
If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example class Foo:
def some_method(self, x, y):
return x + y;
could be written as class Foo:
@classmethod
def some_method(cls, x, y):
return x + y;
![]() |
|||
465 | return 'top_tort' |
||
466 | |||
467 | 1 | @property |
|
468 | def columns(self): |
||
469 | return pd.RangeIndex(self.n_feats, name='tt_fp_idx') |
||
470 | |||
471 | |||
472 | 1 | class MACCSFeaturizer(Transformer, Featurizer): |
|
473 | |||
474 | """ MACCS Keys Fingerprints.""" |
||
475 | |||
476 | 1 | def __init__(self, n_jobs=1, verbose=True): |
|
477 | |||
478 | """ Initialize a MACCS Featurizer. |
||
479 | |||
480 | Args: |
||
481 | n_jobs (int): |
||
482 | The number of processes to run the featurizer in. |
||
483 | |||
484 | verbose (bool): |
||
485 | Whether to output a progress bar. |
||
486 | """ |
||
487 | |||
488 | super(MACCSFeaturizer, self).__init__(n_jobs=n_jobs, verbose=verbose) |
||
489 | self.n_feats = 166 |
||
490 | |||
491 | 1 | def _transform_mol(self, mol): |
|
492 | return np.array(list(GetMACCSKeysFingerprint(mol)))[1:] |
||
493 | |||
494 | 1 | @property |
|
495 | def name(self): |
||
0 ignored issues
–
show
This method should have a docstring.
The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods: class SomeClass:
def some_method(self):
"""Do x and return foo."""
If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions. ![]() This method could be written as a function/class method.
If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example class Foo:
def some_method(self, x, y):
return x + y;
could be written as class Foo:
@classmethod
def some_method(cls, x, y):
return x + y;
![]() |
|||
496 | return 'maccs' |
||
497 | |||
498 | 1 | @property |
|
499 | def columns(self): |
||
500 | return pd.Index( |
||
501 | ['ISOTOPE', '103 < ATOMIC NO. < 256', |
||
502 | 'GROUP IVA,VA,VIA PERIODS 4-6 (Ge...)', 'ACTINIDE', |
||
503 | 'GROUP IIIB,IVB (Sc...)', 'LANTHANIDE', |
||
504 | 'GROUP VB,VIB,VIIB (V...)', 'QAAA@1', 'GROUP VIII (Fe...)', |
||
505 | 'GROUP IIA (ALKALINE EARTH)', '4M RING', 'GROUP IB,IIB (Cu...)', |
||
506 | 'ON(C)C', 'S-S', 'OC(O)O', 'QAA@1', 'CTC', |
||
507 | 'GROUP IIIA (B...)', '7M RING', 'SI', 'C=C(Q)Q', '3M RING', |
||
508 | 'NC(O)O', 'N-O', 'NC(N)N', 'C$=C($A)$A', 'I', |
||
509 | 'QCH2Q', 'P', 'CQ(C)(C)A', 'QX', 'CSN', 'NS', 'CH2=A', |
||
510 | 'GROUP IA (ALKALI METAL)', 'S HETEROCYCLE', |
||
511 | 'NC(O)N', 'NC(C)N', 'OS(O)O', 'S-O', 'CTN', 'F', 'QHAQH', 'OTHER', |
||
512 | 'C=CN', 'BR', 'SAN', 'OQ(O)O', 'CHARGE', |
||
513 | 'C=C(C)C', 'CSO', 'NN', 'QHAAAQH', 'QHAAQH', 'OSO', 'ON(O)C', |
||
514 | 'O HETEROCYCLE', 'QSQ', 'Snot%A%A', 'S=O', |
||
515 | 'AS(A)A', 'A$A!A$A', 'N=O', 'A$A!S', 'C%N', 'CC(C)(C)A', 'QS', |
||
516 | 'QHQH (&...)', 'QQH', 'QNQ', 'NO', 'OAAO', |
||
517 | 'S=A', 'CH3ACH3', 'A!N$A', 'C=C(A)A', 'NAN', 'C=N', 'NAAN', |
||
518 | 'NAAAN', 'SA(A)A', 'ACH2QH', 'QAAAA@1', 'NH2', |
||
519 | 'CN(C)C', 'CH2QCH2', 'X!A$A', 'S', 'OAAAO', 'QHAACH2A', |
||
520 | 'QHAAACH2A', 'OC(N)C', 'QCH3', 'QN', 'NAAO', |
||
521 | '5M RING', 'NAAAO', 'QAAAAA@1', 'C=C', 'ACH2N', '8M RING', 'QO', |
||
522 | 'CL', 'QHACH2A', 'A$A($A)$A', 'QA(Q)Q', |
||
523 | 'XA(A)A', 'CH3AAACH2A', 'ACH2O', 'NCO', 'NACH2A', 'AA(A)(A)A', |
||
524 | 'Onot%A%A', 'CH3CH2A', 'CH3ACH2A', |
||
525 | 'CH3AACH2A', 'NAO', 'ACH2CH2A > 1', 'N=A', |
||
526 | 'HETEROCYCLIC ATOM > 1 (&...)', 'N HETEROCYCLE', 'AN(A)A', |
||
527 | 'OCO', 'QQ', 'AROMATIC RING > 1', 'A!O!A', 'A$A!O > 1 (&...)', |
||
528 | 'ACH2AAACH2A', 'ACH2AACH2A', |
||
529 | 'QQ > 1 (&...)', 'QH > 1', 'OACH2A', 'A$A!N', 'X (HALOGEN)', |
||
530 | 'Nnot%A%A', 'O=A > 1', 'HETEROCYCLE', |
||
531 | 'QCH2A > 1 (&...)', 'OH', 'O > 3 (&...)', 'CH3 > 2 (&...)', |
||
532 | 'N > 1', 'A$A!O', 'Anot%A%Anot%A', |
||
533 | '6M RING > 1', 'O > 2', 'ACH2CH2A', 'AQ(A)A', 'CH3 > 1', |
||
534 | 'A!A$A!A', 'NH', 'OC(C)C', 'QCH2A', 'C=O', |
||
535 | 'A!CH2!A', 'NA(A)A', 'C-O', 'C-N', 'O > 1', 'CH3', 'N', |
||
536 | 'AROMATIC', '6M RING', 'O', 'RING', 'FRAGMENTS'], |
||
537 | name='maccs_idx') |
||
538 | |||
539 | |||
540 | 1 | class ErGFeaturizer(Transformer, Featurizer): |
|
541 | |||
542 | """ Extended Reduced Graph Fingerprints. |
||
543 | |||
544 | Implemented in RDKit.""" |
||
545 | |||
546 | 1 | def __init__(self, atom_types=0, fuzz_increment=0.3, min_path=1, |
|
0 ignored issues
–
show
|
|||
547 | max_path=15, n_jobs=1, verbose=True): |
||
548 | |||
549 | """ Initialize an ErGFeaturizer object. |
||
550 | |||
551 | # TODO complete docstring |
||
0 ignored issues
–
show
|
|||
552 | |||
553 | Args: |
||
554 | atom_types (AtomPairsParameters): |
||
555 | The atom types to use. |
||
556 | |||
557 | fuzz_increment (float): |
||
558 | The fuzz increment. |
||
559 | |||
560 | min_path (int): |
||
561 | The minimum path. |
||
562 | |||
563 | max_path (int): |
||
564 | The maximum path. |
||
565 | |||
566 | n_jobs (int): |
||
567 | The number of processes to run the featurizer in. |
||
568 | |||
569 | verbose (bool): |
||
570 | Whether to output a progress bar. |
||
571 | """ |
||
572 | |||
573 | super(ErGFeaturizer, self).__init__(n_jobs=n_jobs, verbose=verbose) |
||
574 | self.atom_types = atom_types |
||
575 | self.fuzz_increment = fuzz_increment |
||
576 | self.min_path = min_path |
||
577 | self.max_path = max_path |
||
578 | self.n_feats = 315 |
||
579 | |||
580 | 1 | def _transform_mol(self, mol): |
|
581 | |||
582 | return np.array(GetErGFingerprint(mol)) |
||
583 | |||
584 | 1 | @property |
|
585 | def name(self): |
||
0 ignored issues
–
show
This method should have a docstring.
The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods: class SomeClass:
def some_method(self):
"""Do x and return foo."""
If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions. ![]() This method could be written as a function/class method.
If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example class Foo:
def some_method(self, x, y):
return x + y;
could be written as class Foo:
@classmethod
def some_method(cls, x, y):
return x + y;
![]() |
|||
586 | return 'erg' |
||
587 | |||
588 | 1 | @property |
|
589 | def columns(self): |
||
590 | return pd.RangeIndex(self.n_feats, name='erg_fp_idx') |
||
591 | |||
592 | |||
593 | 1 | class FeatureInvariantsFeaturizer(Transformer, Featurizer): |
|
594 | |||
595 | """ Feature invariants fingerprints. """ |
||
596 | |||
597 | 1 | def __init__(self, n_jobs=1, verbose=True): |
|
598 | |||
599 | """ Initialize a FeatureInvariantsFeaturizer. |
||
600 | |||
601 | Args: |
||
602 | verbose (bool): |
||
603 | Whether to output a progress bar. |
||
604 | """ |
||
605 | super(FeatureInvariantsFeaturizer, self).__init__(n_jobs=n_jobs, |
||
606 | verbose=verbose) |
||
607 | raise NotImplementedError |
||
608 | |||
609 | 1 | def _transform_mol(self, mol): |
|
610 | |||
611 | return np.array(GetFeatureInvariants(mol)) |
||
612 | |||
613 | 1 | @property |
|
614 | def name(self): |
||
0 ignored issues
–
show
This method should have a docstring.
The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods: class SomeClass:
def some_method(self):
"""Do x and return foo."""
If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions. ![]() This method could be written as a function/class method.
If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example class Foo:
def some_method(self, x, y):
return x + y;
could be written as class Foo:
@classmethod
def some_method(cls, x, y):
return x + y;
![]() |
|||
615 | return 'feat_inv' |
||
616 | |||
617 | 1 | @property |
|
618 | def columns(self): |
||
619 | return None |
||
620 | |||
621 | |||
622 | 1 | class ConnectivityInvariantsFeaturizer(Transformer, Featurizer): |
|
623 | |||
624 | """ Connectivity invariants fingerprints """ |
||
625 | |||
626 | 1 | def __init__(self, include_ring_membership=True, n_jobs=1, |
|
627 | verbose=True): |
||
628 | |||
629 | """ Initialize a ConnectivityInvariantsFeaturizer. |
||
630 | |||
631 | Args: |
||
632 | include_ring_membership (bool): |
||
633 | Whether ring membership is considered when generating the |
||
634 | invariants. |
||
635 | |||
636 | n_jobs (int): |
||
637 | The number of processes to run the featurizer in. |
||
638 | |||
639 | verbose (bool): |
||
640 | Whether to output a progress bar. |
||
641 | """ |
||
642 | super(ConnectivityInvariantsFeaturizer, self).__init__(self, |
||
643 | n_jobs=n_jobs, |
||
644 | verbose=verbose) |
||
645 | self.include_ring_membership = include_ring_membership |
||
646 | raise NotImplementedError # this is a sparse descriptor |
||
647 | |||
648 | 1 | def _transform_mol(self, mol): |
|
649 | |||
650 | return np.array(GetConnectivityInvariants(mol)) |
||
651 | |||
652 | 1 | @property |
|
653 | def name(self): |
||
0 ignored issues
–
show
This method should have a docstring.
The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods: class SomeClass:
def some_method(self):
"""Do x and return foo."""
If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions. ![]() This method could be written as a function/class method.
If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example class Foo:
def some_method(self, x, y):
return x + y;
could be written as class Foo:
@classmethod
def some_method(cls, x, y):
return x + y;
![]() |
|||
654 | return 'conn_inv' |
||
655 | |||
656 | 1 | @property |
|
657 | def columns(self): |
||
658 | return None |
||
659 | |||
660 | |||
661 | 1 | class RDKFeaturizer(Transformer, Featurizer): |
|
0 ignored issues
–
show
|
|||
662 | |||
663 | """ RDKit fingerprint """ |
||
664 | |||
665 | 1 | def __init__(self, min_path=1, max_path=7, n_feats=2048, n_bits_per_hash=2, |
|
0 ignored issues
–
show
|
|||
666 | use_hs=True, target_density=0.0, min_size=128, |
||
667 | branched_paths=True, use_bond_types=True, n_jobs=1, |
||
668 | verbose=True): |
||
669 | |||
670 | """ RDK fingerprints |
||
671 | |||
672 | Args: |
||
673 | min_path (int): |
||
674 | minimum number of bonds to include in the subgraphs. |
||
675 | |||
676 | max_path (int): |
||
677 | maximum number of bonds to include in the subgraphs. |
||
678 | |||
679 | n_feats (int): |
||
680 | The number of features to which to fold the fingerprint down. |
||
681 | For unfolded, use `-1`. |
||
682 | |||
683 | n_bits_per_hash (int) |
||
684 | number of bits to set per path. |
||
685 | |||
686 | use_hs (bool): |
||
687 | include paths involving Hs in the fingerprint if the molecule |
||
688 | has explicit Hs. |
||
689 | |||
690 | target_density (float): |
||
691 | fold the fingerprint until this minimum density has been |
||
692 | reached. |
||
693 | |||
694 | min_size (int): |
||
695 | the minimum size the fingerprint will be folded to when trying |
||
696 | to reach tgtDensity. |
||
697 | |||
698 | branched_paths (bool): |
||
699 | if set both branched and unbranched paths will be used in the |
||
700 | fingerprint. |
||
701 | |||
702 | use_bond_types (bool): |
||
703 | if set both bond orders will be used in the path hashes. |
||
704 | |||
705 | n_jobs (int): |
||
706 | The number of processes to run the featurizer in. |
||
707 | |||
708 | verbose (bool): |
||
709 | Whether to output a progress bar. |
||
710 | |||
711 | """ |
||
712 | |||
713 | super(RDKFeaturizer, self).__init__(n_jobs=n_jobs, verbose=verbose) |
||
714 | |||
715 | self.min_path = min_path |
||
716 | self.max_path = max_path |
||
717 | self.n_feats = n_feats |
||
718 | self.n_bits_per_hash = n_bits_per_hash |
||
719 | self.use_hs = use_hs |
||
720 | self.target_density = target_density |
||
721 | self.min_size = min_size |
||
722 | self.branched_paths = branched_paths |
||
723 | self.use_bond_types = use_bond_types |
||
724 | |||
725 | 1 | def _transform_mol(self, mol): |
|
726 | |||
727 | return np.array(list(RDKFingerprint(mol, minPath=self.min_path, |
||
728 | maxPath=self.max_path, |
||
729 | fpSize=self.n_feats, |
||
730 | nBitsPerHash=self.n_bits_per_hash, |
||
731 | useHs=self.use_hs, |
||
732 | tgtDensity=self.target_density, |
||
733 | minSize=self.min_size, |
||
734 | branchedPaths=self.branched_paths, |
||
735 | useBondOrder=self.use_bond_types))) |
||
736 | |||
737 | 1 | @property |
|
738 | def name(self): |
||
0 ignored issues
–
show
This method should have a docstring.
The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods: class SomeClass:
def some_method(self):
"""Do x and return foo."""
If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions. ![]() This method could be written as a function/class method.
If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example class Foo:
def some_method(self, x, y):
return x + y;
could be written as class Foo:
@classmethod
def some_method(cls, x, y):
return x + y;
![]() |
|||
739 | return 'rdkfp' |
||
740 | |||
741 | 1 | @property |
|
742 | def columns(self): |
||
743 | return pd.RangeIndex(self.n_feats, name='rdk_fp_idx') |
||
744 |
This can be caused by one of the following:
1. Missing Dependencies
This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.
2. Missing __init__.py files
This error could also result from missing
__init__.py
files in your module folders. Make sure that you place one file in each sub-folder.