OrganicFilter   A
last analyzed

Complexity

Total Complexity 3

Size/Duplication

Total Lines 61
Duplicated Lines 0 %

Test Coverage

Coverage 100%

Importance

Changes 1
Bugs 0 Features 1
Metric Value
wmc 3
c 1
b 0
f 1
dl 0
loc 61
ccs 4
cts 4
cp 1
rs 10

1 Method

Rating   Name   Duplication   Size   Complexity  
A __init__() 0 15 3
1
#! /usr/bin/env python
2
#
3
# Copyright (C) 2016 Rich Lewis <[email protected]>
4
# License: 3-clause BSD
5
6 1
"""
7
8
# skchem.filters.simple
9
10
Simple filters for compounds.
11
12
"""
13
14 1
from collections import Counter
15
16 1
import numpy as np
0 ignored issues
show
Configuration introduced by
The import numpy could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
17 1
import pandas as pd
0 ignored issues
show
Configuration introduced by
The import pandas could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
18
19 1
from ..resource import ORGANIC, PERIODIC_TABLE
20 1
from .base import Filter
21
22
23 1
class ElementFilter(Filter):
24
25
    """ Filter by elements.
26
27
        Examples:
28
29
            Basic usage on molecules:
30
31
            >>> import skchem
32
            >>> hal_f = skchem.filters.ElementFilter(['F', 'Cl', 'Br', 'I'])
33
34
            Molecules with one of the atoms transform to `True`.
35
36
            >>> m1 = skchem.Mol.from_smiles('ClC(Cl)Cl', name='chloroform')
37
            >>> hal_f.transform(m1)
38
            True
39
40
            Molecules with none of the atoms transform to `False`.
41
42
            >>> m2 = skchem.Mol.from_smiles('CC', name='ethane')
43
            >>> hal_f.transform(m2)
44
            False
45
46
            Can see the atom breakdown by passing `agg` == `False`:
47
            >>> hal_f.transform(m1, agg=False)
48
            has_element
49
            F     0
50
            Cl    3
51
            Br    0
52
            I     0
53
            Name: ElementFilter, dtype: int64
54
55
            Can transform series.
56
57
            >>> ms = [m1, m2]
58
            >>> hal_f.transform(ms)
59
            chloroform     True
60
            ethane        False
61
            dtype: bool
62
63
            >>> hal_f.transform(ms, agg=False)
64
            has_element  F  Cl  Br  I
65
            chloroform   0   3   0  0
66
            ethane       0   0   0  0
67
68
            Can also filter series:
69
70
            >>> hal_f.filter(ms)
71
            chloroform    <Mol: ClC(Cl)Cl>
72
            Name: structure, dtype: object
73
74
            >>> hal_f.filter(ms, neg=True)
75
            ethane    <Mol: CC>
76
            Name: structure, dtype: object
77
78
        """
79 1
    def __init__(self, elements=None, as_bits=False, agg='any', n_jobs=1,
0 ignored issues
show
best-practice introduced by
Too many arguments (6/5)
Loading history...
80
                 verbose=True):
81
82
        """ Initialize an ElementFilter object.
83
84
        Args:
85
            elements (list[str]):
86
                A list of elements to filter with.  If an element not in the
87
                list is found in a molecule, return False, else return True.
88
89
            as_bits (bool):
90
                Whether to return integer counts or booleans for atoms if mode
91
                is `count`.
92
93
            agg (str or callable):
94
                The callable to combine rows to produce the predicate.
95
96
            n_jobs (int):
97
                How many processes to use.
98
99
            verbose(bool):
100
                Whether to output a progress bar.
101
        """
102
103 1
        self._elements = None
104
105 1
        self.elements = elements
106 1
        self.as_bits = as_bits
107 1
        super(ElementFilter, self).__init__(agg=agg, n_jobs=n_jobs,
108
                                            verbose=verbose)
109
110 1
    @property
111
    def elements(self):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
112 1
        return self._elements
113
114 1
    @elements.setter
115
    def elements(self, val):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
116 1
        if val is None:
117 1
            self._elements = PERIODIC_TABLE.symbol.tolist()
118
        else:
119 1
            self._elements = val
120
121 1
    @property
122
    def columns(self):
123 1
        return pd.Index(self.elements, name='has_element')
124
125 1
    def _transform_mol(self, mol):
0 ignored issues
show
Bug introduced by
This method seems to be hidden by an attribute defined in skchem.filters.base on line 174.
Loading history...
126
127 1
        counter = Counter(atom.symbol for atom in mol.atoms)
128 1
        res = pd.Series(counter)
129
130 1
        res = res[self.elements].fillna(0).astype(int)
131
132 1
        if self.as_bits:
133
            res = (res > 0).astype(np.uint8)
134
135 1
        return res
136
137
138 1
class OrganicFilter(ElementFilter):
139
140
    """ Whether a molecule is organic.
141
142
        For the purpose of this function, an organic molecule is defined as
143
        having atoms with elements only in the set H, B, C, N, O, F, P, S, Cl,
144
        Br, I.
145
146
        Examples:
147
                Basic usage as a function on molecules:
148
                >>> import skchem
149
                >>> of = skchem.filters.OrganicFilter()
150
                >>> benzene = skchem.Mol.from_smiles('c1ccccc1', name='benzene')
151
152
                >>> of.transform(benzene)
153
                True
154
155
                >>> ferrocene = skchem.Mol.from_smiles('[cH-]1cccc1.[cH-]1cccc1.[Fe+2]',
156
                ...                                    name='ferrocene')
157
                >>> of.transform(ferrocene)
158
                False
159
160
                More useful on collections:
161
162
                >>> sa = skchem.Mol.from_smiles('CC(=O)[O-].[Na+]', name='sodium acetate')
163
                >>> norbornane = skchem.Mol.from_smiles('C12CCC(C2)CC1', name='norbornane')
164
165
                >>> data = [benzene, ferrocene, norbornane, sa]
166
                >>> of.transform(data)
167
                benzene            True
168
                ferrocene         False
169
                norbornane         True
170
                sodium acetate    False
171
                dtype: bool
172
173
                >>> of.filter(data)
174
                benzene          <Mol: c1ccccc1>
175
                norbornane    <Mol: C1CC2CCC1C2>
176
                Name: structure, dtype: object
177
178
                >>> of.filter(data, neg=True)
179
                ferrocene         <Mol: [Fe+2].c1cc[cH-]c1.c1cc[cH-]c1>
180
                sodium acetate                  <Mol: CC(=O)[O-].[Na+]>
181
                Name: structure, dtype: object
182
        """
183
184 1
    def __init__(self, n_jobs=1, verbose=True):
185
186
        """ Initialize an Organic Filter object.
187
188
        Args:
189
            n_jobs (int):
190
                The number of processes to run the filter in.
191
            verbose (bool):
192
                Whether to output a progress bar.
193
        """
194
195 1
        super(OrganicFilter, self).__init__(elements=None, agg='not any',
196
                                            n_jobs=n_jobs, verbose=verbose)
197 1
        self.elements = [element for element in self.elements
198
                         if element not in ORGANIC]
199
200
201 1
def n_atoms(mol, above=2, below=75, include_hydrogens=False):
202
203
    """ Whether the number of atoms in a molecule falls in a defined interval.
204
205
    ``above <= n_atoms < below``
206
207
    Args:
208
        mol: (skchem.Mol):
209
            The molecule to be tested.
210
        above (int):
211
            The lower threshold number of atoms (exclusive).
212
        below (int):
213
            The higher threshold number of atoms (inclusive).
214
        include_hydrogens (bool):
215
            Whether to consider hydrogens in the atom count.
216
217
    Returns:
218
        bool:
219
            Whether the molecule has more atoms than the threshold.
220
221
    Examples:
222
223
        Basic usage as a function on molecules:
224
225
        >>> import skchem
226
        >>> m = skchem.Mol.from_smiles('c1ccccc1') # benzene has 6 atoms.
227
228
        Lower threshold:
229
230
        >>> skchem.filters.n_atoms(m, above=3)
231
        True
232
        >>> skchem.filters.n_atoms(m, above=8)
233
        False
234
235
        Higher threshold:
236
237
        >>> skchem.filters.n_atoms(m, below=8)
238
        True
239
        >>> skchem.filters.n_atoms(m, below=3)
240
        False
241
242
        Bounds work like Python slices - inclusive lower, exclusive upper:
243
244
        >>> skchem.filters.n_atoms(m, above=6)
245
        True
246
        >>> skchem.filters.n_atoms(m, below=6)
247
        False
248
249
        Both can be used at once:
250
251
        >>> skchem.filters.n_atoms(m, above=3, below=8)
252
        True
253
254
        Can include hydrogens:
255
256
        >>> skchem.filters.n_atoms(m, above=3, below=8, include_hydrogens=True)
257
        False
258
        >>> skchem.filters.n_atoms(m, above=9, below=14, include_hydrogens=True)
259
        True
260
    """
261
262 1
    assert above < below, 'Interval {} < a < {} undefined.'.format(above,
263
                                                                   below)
264
265 1
    n_a = len(mol.atoms)
266 1
    if include_hydrogens:
267 1
        n_a += sum(atom.GetNumImplicitHs() + atom.GetNumExplicitHs()
268
                   for atom in mol.atoms)
269
270 1
    return above <= n_a < below
271
272
273 1 View Code Duplication
class AtomNumberFilter(Filter):
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
274
275
    """Filter whether the number of atoms in a Mol falls in a defined interval.
276
277
    `above <= n_atoms < below`
278
279
    Examples:
280
        >>> import skchem
281
282
        >>> data = [
283
        ...         skchem.Mol.from_smiles('CC', name='ethane'),
284
        ...         skchem.Mol.from_smiles('CCCC', name='butane'),
285
        ...         skchem.Mol.from_smiles('NC(C)C(=O)O', name='alanine'),
286
        ...         skchem.Mol.from_smiles('C12C=CC(C=C2)C=C1', name='barrelene')
287
        ... ]
288
289
        >>> af = skchem.filters.AtomNumberFilter(above=3, below=7)
290
291
        >>> af.transform(data)
292
        ethane       False
293
        butane        True
294
        alanine       True
295
        barrelene    False
296
        Name: num_atoms_in_range, dtype: bool
297
298
        >>> af.filter(data)
299
        butane            <Mol: CCCC>
300
        alanine    <Mol: CC(N)C(=O)O>
301
        Name: structure, dtype: object
302
303
        >>> af = skchem.filters.AtomNumberFilter(above=5, below=15, include_hydrogens=True)
304
305
        >>> af.transform(data)
306
        ethane        True
307
        butane        True
308
        alanine       True
309
        barrelene    False
310
        Name: num_atoms_in_range, dtype: bool
311
    """
312
313 1
    def __init__(self, above=3, below=60, include_hydrogens=False, n_jobs=1,
0 ignored issues
show
best-practice introduced by
Too many arguments (6/5)
Loading history...
314
                 verbose=True):
315
316
        """ Initialize an AtomNumberFilter object.
317
318
        Args:
319
            mol: (skchem.Mol):
320
                The molecule to be tested.
321
            above (int):
322
                The lower threshold on the mass.
323
            below (int):
324
                The higher threshold on the mass.
325
            n_jobs (int):
326
                The number of processes to run the filter in.
327
            verbose (bool):
328
                Whether to output a progress bar.
329
        """
330
331 1
        assert above < below, 'Interval {} < a < {} undefined.'.format(above,
332
                                                                       below)
333 1
        self.above = above
334 1
        self.below = below
335 1
        self.include_hydrogens = include_hydrogens
336
337 1
        super(AtomNumberFilter, self).__init__(agg='any', n_jobs=n_jobs,
338
                                               verbose=verbose)
339
340 1
    def _transform_mol(self, mol):
0 ignored issues
show
Bug introduced by
This method seems to be hidden by an attribute defined in skchem.filters.base on line 174.
Loading history...
341 1
        return n_atoms(mol, above=self.above, below=self.below,
342
                       include_hydrogens=self.include_hydrogens)
343
344 1
    @property
345
    def columns(self):
346 1
        return pd.Index(['num_atoms_in_range'])
347
348
349 1
def mass(mol, above=10, below=900):
350
351
    """ Whether a the molecular weight of a molecule is lower than a threshold.
352
353
    `above <= mass < below`
354
355
    Args:
356
        mol: (skchem.Mol):
357
            The molecule to be tested.
358
        above (float):
359
            The lower threshold on the mass.
360
            Defaults to None.
361
        below (float):
362
            The higher threshold on the mass.
363
            Defaults to None.
364
365
    Returns:
366
        bool:
367
            Whether the mass of the molecule is lower than the threshold.
368
369
    Examples:
370
        Basic usage as a function on molecules:
371
372
        >>> import skchem
373
        >>> m = skchem.Mol.from_smiles('c1ccccc1') # benzene has M_r = 78.
374
        >>> skchem.filters.mass(m, above=70)
375
        True
376
        >>> skchem.filters.mass(m, above=80)
377
        False
378
        >>> skchem.filters.mass(m, below=80)
379
        True
380
        >>> skchem.filters.mass(m, below=70)
381
        False
382
        >>> skchem.filters.mass(m, above=70, below=80)
383
        True
384
    """
385
386 1
    return above <= mol.mass < below
387
388
389 1 View Code Duplication
class MassFilter(Filter):
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
390
391
    """ Filter whether the molecular weight of a molecule is outside a range.
392
393
    `above <= mass < below`
394
395
    Examples:
396
397
        >>> import skchem
398
399
        >>> data = [
400
        ...         skchem.Mol.from_smiles('CC', name='ethane'),
401
        ...         skchem.Mol.from_smiles('CCCC', name='butane'),
402
        ...         skchem.Mol.from_smiles('NC(C)C(=O)O', name='alanine'),
403
        ...         skchem.Mol.from_smiles('C12C=CC(C=C2)C=C1', name='barrelene')
404
        ... ]
405
406
        >>> mf = skchem.filters.MassFilter(above=31, below=100)
407
408
        >>> mf.transform(data)
409
        ethane       False
410
        butane        True
411
        alanine       True
412
        barrelene    False
413
        Name: mass_in_range, dtype: bool
414
415
        >>> mf.filter(data)
416
        butane            <Mol: CCCC>
417
        alanine    <Mol: CC(N)C(=O)O>
418
        Name: structure, dtype: object
419
420
    """
421
422 1
    def __init__(self, above=3, below=900, n_jobs=1, verbose=True):
423
424
        """ Initialize a MassFilter object.
425
426
        Args:
427
            mol: (skchem.Mol):
428
                The molecule to be tested.
429
            above (float):
430
                The lower threshold on the mass.
431
            below (float):
432
                The higher threshold on the mass.
433
            n_jobs (int):
434
                The number of processes to run the filter in.
435
            verbose (bool):
436
                Whether to output a progress bar.
437
        """
438
439 1
        assert above < below, 'Interval {} < a < {} undefined.'.format(above,
440
                                                                       below)
441 1
        self.above = above
442 1
        self.below = below
443
444 1
        super(MassFilter, self).__init__(agg='any', n_jobs=n_jobs,
445
                                         verbose=verbose)
446
447 1
    def _transform_mol(self, mol):
0 ignored issues
show
Bug introduced by
This method seems to be hidden by an attribute defined in skchem.filters.base on line 174.
Loading history...
448 1
        return mass(mol, above=self.above, below=self.below)
449
450 1
    @property
451
    def columns(self):
452
        return pd.Index(['mass_in_range'])
453