skchem.io.read_smiles() - Code Metrics - Inspection of "updated pylintrc" - richlewis42/scikit-chem - Measure and Improve Code Quality continuously with Scrutinizer

Completed

Push — master ( 2bc047...202252 )

by Rich

created 2016-04-14 16:47 UTC

skchem.io.read_smiles() F

↳ Parent: Project

Complexity

Conditions

Size

Total Lines

Duplication

Lines	0
Ratio	0 %

Metric	Value
cc	11
dl	0
loc	78
rs	3.253

1 Method

Rating	Name	Duplication	Size	Complexity
A	parse()	0	11	1

How to fix Long Method Complexity

#! /usr/bin/env python
#
# Copyright (C) 2007-2009 Rich Lewis <[email protected]>
# License: 3-clause BSD

"""
skchem.io.smiles

Defining input and output operations for smiles files.
"""

import skchem
import pandas as pd
# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
from skchem.utils import Suppressor

def read_smiles(smiles_file, smiles_column=0, name_column=None, delimiter='\t',

                        title_line=False, force=False, *args, **kwargs):


    """
    Read a smiles file into a pandas dataframe.  The class wraps the pandas read_csv function.

    @param smiles_file      A file path provided as a :str:, or a :file-like: object.
    @param smiles_column    The column index as an :int: in which the smiles strings are provided.
                            Defaults to _zero_.
    @param name_column      The column index as an :int: in which compound names are provided,
                            for use as the index in the dataframe.  Defaults to _None_.
    @param delimiter        The delimiter used, specified as a :str:.
                            Defaults to _<tab>_.
    @param title_line       A :bool: specifying whether a title line is provided,
                            to use as column titles.
    @param force            A :bool: specifying whether poorly parsed molecules should be skipped,
                            or an error thrown.
    Additionally, pandas read_csv arguments may be provided.

    @returns df             A dataframe of type :pandas.core.frame.DataFrame:.
    """

    with Suppressor():

        # set the header line to pass to the pandas parser
        # we accept True as being line zero, as is usual for smiles
        # if user specifies a header already, then do nothing

        header = kwargs.get('header', None)
        if title_line is True:
            header = 0
        elif header is not None:
            kwargs.pop('header') #remove from the kwargs to not pass it twice
        else:
            header = None

        # open file if not already open
        if isinstance(smiles_file, str):
            smiles_file = open(smiles_file, 'r')

        # read the smiles file
        df = pd.read_csv(smiles_file, delimiter=delimiter, header=header, *args, **kwargs)


        # replace the smiles column with the structure column
        lst = list(df.columns)
        lst[smiles_column] = 'structure'
        df.columns = lst

        # apply the from smiles constructor
        if force:
            def parse(smiles):

                """
                Parse a molecule from smiles string and return None if it doesn't load
                (restoring rdkit functionality)
                """

                try:
                    return skchem.Mol.from_smiles(smiles)

                except ValueError:
                    return None
        else:
            def parse(smiles):
                """
                Parse a molecule from smiles string
                """
                return skchem.Mol.from_smiles(smiles)


        df['structure'] = df['structure'].apply(str).apply(parse) #make sure is a string

        if force:
            df = df[df['structure'].notnull()]


        # set index if passed
        if name_column is not None:
            df = df.set_index(df.columns[name_column])


        return df

@classmethod
def from_smiles(_, *args, **kwargs):
    """ Create a DataFrame from a smiles file """
    return read_smiles(*args, **kwargs)

#set on pandas dataframe
pd.DataFrame.from_smiles = from_smiles


1			#! /usr/bin/env python
2			#
3			# Copyright (C) 2007-2009 Rich Lewis <[email protected]>
4			# License: 3-clause BSD
5
6			"""
7			skchem.io.smiles
8
9			Defining input and output operations for smiles files.
10			"""
11
12			import skchem
13			import pandas as pd
			0 ignored issues – show Configuration introduced 2016-01-19 16:21 UTC by Report Bug Copy Issue Report The import `pandas` could not be resolved. This can be caused by one of the following: 1. Missing Dependencies This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands. # .scrutinizer.yml before_commands: - sudo pip install abc # Python2 - sudo pip3 install abc # Python3 Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version. 2. Missing __init__.py files This error could also result from missing `__init__.py` files in your module folders. Make sure that you place one file in each sub-folder. Loading history...
14			from skchem.utils import Suppressor
15
16			def read_smiles(smiles_file, smiles_column=0, name_column=None, delimiter='\t',
			0 ignored issues – show best-practice introduced 2016-04-14 16:48 UTC by Report Bug Copy Issue Report Too many arguments (6/5) Loading history...
17			title_line=False, force=False, args, *kwargs):
			0 ignored issues – show Coding Style introduced 2016-04-14 16:48 UTC by Report Bug Copy Issue Report Wrong continued indentation. title_line=False, force=False, args, *kwargs): \| ^ Loading history...
18
19			"""
20			Read a smiles file into a pandas dataframe. The class wraps the pandas read_csv function.
21
22			@param smiles_file A file path provided as a :str:, or a :file-like: object.
23			@param smiles_column The column index as an :int: in which the smiles strings are provided.
24			Defaults to _zero_.
25			@param name_column The column index as an :int: in which compound names are provided,
26			for use as the index in the dataframe. Defaults to _None_.
27			@param delimiter The delimiter used, specified as a :str:.
28			Defaults to _<tab>_.
29			@param title_line A :bool: specifying whether a title line is provided,
30			to use as column titles.
31			@param force A :bool: specifying whether poorly parsed molecules should be skipped,
32			or an error thrown.
33			Additionally, pandas read_csv arguments may be provided.
34
35			@returns df A dataframe of type :pandas.core.frame.DataFrame:.
36			"""
37
38			with Suppressor():
39
40			# set the header line to pass to the pandas parser
41			# we accept True as being line zero, as is usual for smiles
42			# if user specifies a header already, then do nothing
43
44			header = kwargs.get('header', None)
45			if title_line is True:
46			header = 0
47			elif header is not None:
48			kwargs.pop('header') #remove from the kwargs to not pass it twice
49			else:
50			header = None
51
52			# open file if not already open
53			if isinstance(smiles_file, str):
54			smiles_file = open(smiles_file, 'r')
55
56			# read the smiles file
57			df = pd.read_csv(smiles_file, delimiter=delimiter, header=header, args, *kwargs)
			0 ignored issues – show Coding Style Naming introduced 2016-04-14 16:48 UTC by Report Bug Copy Issue Report The name `df` does not conform to the variable naming conventions (`[a-z_][a-z0-9_]{2,30}$`). This check looks for invalid names for a range of different identifiers. You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements. If your project includes a Pylint configuration file, the settings contained in that file take precedence. To find out more about Pylint, please refer to their site. Loading history...
58
59			# replace the smiles column with the structure column
60			lst = list(df.columns)
61			lst[smiles_column] = 'structure'
62			df.columns = lst
63
64			# apply the from smiles constructor
65			if force:
66			def parse(smiles):
67
68			"""
69			Parse a molecule from smiles string and return None if it doesn't load
70			(restoring rdkit functionality)
71			"""
72
73			try:
74			return skchem.Mol.from_smiles(smiles)
			0 ignored issues – show Bug introduced 2016-01-19 16:21 UTC by Report Bug Copy Issue Report The Class `Mol` does not seem to have a member named `from_smiles`. This check looks for calls to members that are non-existent. These calls will fail. The member could have been renamed or removed. Loading history...
75			except ValueError:
76			return None
77			else:
78			def parse(smiles):
79			"""
80			Parse a molecule from smiles string
81			"""
82			return skchem.Mol.from_smiles(smiles)
			0 ignored issues – show Bug introduced 2016-01-19 16:21 UTC by Report Bug Copy Issue Report The Class `Mol` does not seem to have a member named `from_smiles`. This check looks for calls to members that are non-existent. These calls will fail. The member could have been renamed or removed. Loading history...
83
84			df['structure'] = df['structure'].apply(str).apply(parse) #make sure is a string
85
86			if force:
87			df = df[df['structure'].notnull()]
			0 ignored issues – show Coding Style Naming introduced 2016-04-14 16:48 UTC by Report Bug Copy Issue Report The name `df` does not conform to the variable naming conventions (`[a-z_][a-z0-9_]{2,30}$`). This check looks for invalid names for a range of different identifiers. You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements. If your project includes a Pylint configuration file, the settings contained in that file take precedence. To find out more about Pylint, please refer to their site. Loading history...
88
89			# set index if passed
90			if name_column is not None:
91			df = df.set_index(df.columns[name_column])
			0 ignored issues – show Coding Style Naming introduced 2016-04-14 16:48 UTC by Report Bug Copy Issue Report The name `df` does not conform to the variable naming conventions (`[a-z_][a-z0-9_]{2,30}$`). This check looks for invalid names for a range of different identifiers. You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements. If your project includes a Pylint configuration file, the settings contained in that file take precedence. To find out more about Pylint, please refer to their site. Loading history...
92
93			return df
94
95			@classmethod
96			def from_smiles(_, args, *kwargs):
97			""" Create a DataFrame from a smiles file """
98			return read_smiles(args, *kwargs)
99
100			#set on pandas dataframe
101			pd.DataFrame.from_smiles = from_smiles
102

richlewis42 / scikit-chem

Push — master ( 2bc047...202252 )

skchem.io.read_smiles() F

Complexity

Size

Duplication

1 Method

How to fix Long Method Complexity

Long Method

Complexity

1. Missing Dependencies

2. Missing __init__.py files

Duplication Side-by-Side

Filter issues like

2. Missing init.py files