Completed
Push — master ( 2bc047...202252 )
by Rich
01:16
created

read_smiles()   F

Complexity

Conditions 11

Size

Total Lines 78

Duplication

Lines 0
Ratio 0 %
Metric Value
cc 11
dl 0
loc 78
rs 3.253

1 Method

Rating   Name   Duplication   Size   Complexity  
A parse() 0 11 1

How to fix   Long Method    Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

Complexity

Complex classes like read_smiles() often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

1
#! /usr/bin/env python
2
#
3
# Copyright (C) 2007-2009 Rich Lewis <[email protected]>
4
# License: 3-clause BSD
5
6
"""
7
skchem.io.smiles
8
9
Defining input and output operations for smiles files.
10
"""
11
12
import skchem
13
import pandas as pd
0 ignored issues
show
Configuration introduced by
The import pandas could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
14
from skchem.utils import Suppressor
15
16
def read_smiles(smiles_file, smiles_column=0, name_column=None, delimiter='\t',
0 ignored issues
show
best-practice introduced by
Too many arguments (6/5)
Loading history...
17
                        title_line=False, force=False, *args, **kwargs):
0 ignored issues
show
Coding Style introduced by
Wrong continued indentation.
title_line=False, force=False, *args, **kwargs):
| ^
Loading history...
18
19
    """
20
    Read a smiles file into a pandas dataframe.  The class wraps the pandas read_csv function.
21
22
    @param smiles_file      A file path provided as a :str:, or a :file-like: object.
23
    @param smiles_column    The column index as an :int: in which the smiles strings are provided.
24
                            Defaults to _zero_.
25
    @param name_column      The column index as an :int: in which compound names are provided,
26
                            for use as the index in the dataframe.  Defaults to _None_.
27
    @param delimiter        The delimiter used, specified as a :str:.
28
                            Defaults to _<tab>_.
29
    @param title_line       A :bool: specifying whether a title line is provided,
30
                            to use as column titles.
31
    @param force            A :bool: specifying whether poorly parsed molecules should be skipped,
32
                            or an error thrown.
33
    Additionally, pandas read_csv arguments may be provided.
34
35
    @returns df             A dataframe of type :pandas.core.frame.DataFrame:.
36
    """
37
38
    with Suppressor():
39
40
        # set the header line to pass to the pandas parser
41
        # we accept True as being line zero, as is usual for smiles
42
        # if user specifies a header already, then do nothing
43
44
        header = kwargs.get('header', None)
45
        if title_line is True:
46
            header = 0
47
        elif header is not None:
48
            kwargs.pop('header') #remove from the kwargs to not pass it twice
49
        else:
50
            header = None
51
52
        # open file if not already open
53
        if isinstance(smiles_file, str):
54
            smiles_file = open(smiles_file, 'r')
55
56
        # read the smiles file
57
        df = pd.read_csv(smiles_file, delimiter=delimiter, header=header, *args, **kwargs)
0 ignored issues
show
Coding Style Naming introduced by
The name df does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
58
59
        # replace the smiles column with the structure column
60
        lst = list(df.columns)
61
        lst[smiles_column] = 'structure'
62
        df.columns = lst
63
64
        # apply the from smiles constructor
65
        if force:
66
            def parse(smiles):
67
68
                """
69
                Parse a molecule from smiles string and return None if it doesn't load
70
                (restoring rdkit functionality)
71
                """
72
73
                try:
74
                    return skchem.Mol.from_smiles(smiles)
0 ignored issues
show
Bug introduced by
The Class Mol does not seem to have a member named from_smiles.

This check looks for calls to members that are non-existent. These calls will fail.

The member could have been renamed or removed.

Loading history...
75
                except ValueError:
76
                    return None
77
        else:
78
            def parse(smiles):
79
                """
80
                Parse a molecule from smiles string
81
                """
82
                return skchem.Mol.from_smiles(smiles)
0 ignored issues
show
Bug introduced by
The Class Mol does not seem to have a member named from_smiles.

This check looks for calls to members that are non-existent. These calls will fail.

The member could have been renamed or removed.

Loading history...
83
84
        df['structure'] = df['structure'].apply(str).apply(parse) #make sure is a string
85
86
        if force:
87
            df = df[df['structure'].notnull()]
0 ignored issues
show
Coding Style Naming introduced by
The name df does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
88
89
        # set index if passed
90
        if name_column is not None:
91
            df = df.set_index(df.columns[name_column])
0 ignored issues
show
Coding Style Naming introduced by
The name df does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
92
93
        return df
94
95
@classmethod
96
def from_smiles(_, *args, **kwargs):
97
    """ Create a DataFrame from a smiles file """
98
    return read_smiles(*args, **kwargs)
99
100
#set on pandas dataframe
101
pd.DataFrame.from_smiles = from_smiles
102