Completed
Push — master ( 3371cb...5cb87e )
by Rich
01:23
created

read_smiles()   F

Complexity

Conditions 10

Size

Total Lines 95

Duplication

Lines 0
Ratio 0 %

Importance

Changes 5
Bugs 0 Features 2
Metric Value
cc 10
c 5
b 0
f 2
dl 0
loc 95
rs 3.2727

1 Method

Rating   Name   Duplication   Size   Complexity  
A parse() 0 12 4

How to fix   Long Method    Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

Complexity

Complex classes like read_smiles() often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

1
#! /usr/bin/env python
0 ignored issues
show
Bug introduced by
There seems to be a cyclic import (skchem.core.atom -> skchem.core.base).

Cyclic imports may cause partly loaded modules to be returned. This might lead to unexpected runtime behavior which is hard to debug.

Loading history...
Bug introduced by
There seems to be a cyclic import (skchem.core -> skchem.core.mol).

Cyclic imports may cause partly loaded modules to be returned. This might lead to unexpected runtime behavior which is hard to debug.

Loading history...
Bug introduced by
There seems to be a cyclic import (skchem -> skchem.descriptors -> skchem.descriptors.physicochemical -> skchem.descriptors.fingerprints).

Cyclic imports may cause partly loaded modules to be returned. This might lead to unexpected runtime behavior which is hard to debug.

Loading history...
Bug introduced by
There seems to be a cyclic import (skchem -> skchem.descriptors -> skchem.descriptors.atom).

Cyclic imports may cause partly loaded modules to be returned. This might lead to unexpected runtime behavior which is hard to debug.

Loading history...
Bug introduced by
There seems to be a cyclic import (skchem.core -> skchem.core.bond).

Cyclic imports may cause partly loaded modules to be returned. This might lead to unexpected runtime behavior which is hard to debug.

Loading history...
Bug introduced by
There seems to be a cyclic import (skchem -> skchem.descriptors -> skchem.descriptors.fingerprints).

Cyclic imports may cause partly loaded modules to be returned. This might lead to unexpected runtime behavior which is hard to debug.

Loading history...
2
#
3
# Copyright (C) 2007-2009 Rich Lewis <[email protected]>
4
# License: 3-clause BSD
5
6
"""
7
# skchem.io.smiles
8
9
Defining input and output operations for smiles files.
10
"""
11
12
import warnings
13
from functools import wraps
14
15
import pandas as pd
0 ignored issues
show
Configuration introduced by
The import pandas could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
16
17
from ..utils import Suppressor
18
from ..core import Mol
19
20
def read_smiles(smiles_file, smiles_column=0, name_column=None, delimiter='\t',
0 ignored issues
show
best-practice introduced by
Too many arguments (8/5)
Loading history...
21
                title_line=False, error_bad_mol=False, warn_bad_mol=True,
22
                drop_bad_mol=True, *args, **kwargs):
23
24
    """Read a smiles file into a pandas dataframe.
25
26
    The class wraps the pandas read_csv function.
27
28
    smiles_file (str, file-like):
29
        Location of data to load, specified as a string or passed directly as a
30
        file-like object.  URLs may also be used, see the pandas.read_csv
31
        documentation.
32
    smiles_column (int):
33
        The column index at which SMILES are provided.
34
        Defaults to `0`.
35
    name_column (int):
36
        The column index at which compound names are provided, for use as the
37
        index in the dataframe.  If None, use the default index.
38
        Defaults to `None`.
39
    delimiter (str):
40
        The delimiter used.
41
        Defaults to `\t`.
42
    title_line (bool):
43
        Whether a title line is provided, to use as column titles.
44
        Defaults to `False`.
45
    error_bad_mol (bool):
46
        Whether an error should be raised when a molecule fails to parse.
47
        Defaults to `False`.
48
    warn_bad_mol (bool):
49
        Whether a warning should be raised when a molecule fails to parse.
50
        Defaults to `True`.
51
    drop_bad_mol (bool):
52
        If true, drop any column with smiles that failed to parse. Otherwise,
53
        the field is None. Defaults to `True`.
54
    *args, **kwargs:
55
        Arguments will be passed to pandas read_csv arguments.
56
57
    Returns:
58
        pandas.DataFrame:
59
            The loaded data frame, with Mols supplied in the `structure` field.
60
61
    See Also:
62
        pandas.read_csv
63
        skchem.Mol.from_smiles
64
        skchem.io.sdf
65
66
    """
67
68
    with Suppressor():
69
70
        # set the header line to pass to the pandas parser
71
        # we accept True as being line zero, as is usual for smiles
72
        # if user specifies a header already, then do nothing
73
74
        header = kwargs.pop('header', None)
75
        if title_line is True:
76
            header = 0
77
        elif header is not None:
78
            pass #remove from the kwargs to not pass it twice
79
        else:
80
            header = None
81
82
        # read the smiles file
83
        data = pd.read_csv(smiles_file, delimiter=delimiter, header=header,
84
                           *args, **kwargs)
85
86
        # replace the smiles column with the structure column
87
        lst = list(data.columns)
88
        lst[smiles_column] = 'structure'
89
        data.columns = lst
90
91
        def parse(row):
92
            """ Parse smiles for row """
93
            try:
94
                return Mol.from_smiles(row.structure)
0 ignored issues
show
Bug introduced by
The Class Mol does not seem to have a member named from_smiles.

This check looks for calls to members that are non-existent. These calls will fail.

The member could have been renamed or removed.

Loading history...
95
            except ValueError:
96
                msg = 'Molecule {} could not be decoded.'.format(row.name)
97
                if error_bad_mol:
98
                    raise ValueError(msg)
99
                elif warn_bad_mol:
100
                    warnings.warn(msg)
101
102
                return None
103
104
        data['structure'] = data['structure'].apply(str)
105
        data['structure'] = data.apply(parse, axis=1)
106
107
        if drop_bad_mol:
108
            data = data[data['structure'].notnull()]
109
110
        # set index if passed
111
        if name_column is not None:
112
            data = data.set_index(data.columns[name_column])
113
114
        return data
115
116
@classmethod
117
@wraps(read_smiles)
118
def _from_smiles(_, *args, **kwargs):
119
    return read_smiles(*args, **kwargs)
120
121
#set on pandas dataframe
122
pd.DataFrame.from_smiles = _from_smiles
123