PhysPropConverter.fix_temp()   C
last analyzed

Complexity

Conditions 10

Size

Total Lines 19

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 2
CRAP Score 81.6196

Importance

Changes 1
Bugs 0 Features 1
Metric Value
c 1
b 0
f 1
dl 0
loc 19
ccs 2
cts 19
cp 0.1053
rs 6
cc 10
crap 81.6196

How to fix   Complexity   

Complexity

Complex classes like PhysPropConverter.fix_temp() often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

1
#! /usr/bin/env python
0 ignored issues
show
Coding Style introduced by
This module should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
2
#
3
# Copyright (C) 2016 Rich Lewis <[email protected]>
4
# License: 3-clause BSD
5
6 1
import os
7 1
import zipfile
8 1
import logging
9 1
LOGGER = logging.getLogger(__name__)
10
11 1
import pandas as pd
0 ignored issues
show
Configuration introduced by
The import pandas could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
12 1
import numpy as np
0 ignored issues
show
Configuration introduced by
The import numpy could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
13
14 1
from ... import io
15 1
from .base import Converter, contiguous_order
16
17 1
from ...cross_validation import SimThresholdSplit
18
19 1
TXT_COLUMNS = [l.lower() for l in """CAS
20
Formula
21
Mol_Weight
22
Chemical_Name
23
WS
24
WS_temp
25
WS_type
26
WS_reference
27
LogP
28
LogP_temp
29
LogP_type
30
LogP_reference
31
VP
32
VP_temp
33
VP_type
34
VP_reference
35
DC_pKa
36
DC_temp
37
DC_type
38
DC_reference
39
henry_law Constant
40
HL_temp
41
HL_type
42
HL_reference
43
OH
44
OH_temp
45
OH_type
46
OH_reference
47
BP_pressure
48
MP
49
BP
50
FP""".split('\n')]
51
52 1
class PhysPropConverter(Converter):
0 ignored issues
show
Coding Style introduced by
This class should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
53
54 1
    def __init__(self, directory, output_directory, output_filename='physprop.h5'):
0 ignored issues
show
Bug introduced by
The __init__ method of the super-class Converter is not called.

It is generally advisable to initialize the super-class by calling its __init__ method:

class SomeParent:
    def __init__(self):
        self.x = 1

class SomeChild(SomeParent):
    def __init__(self):
        # Initialize the super class
        SomeParent.__init__(self)
Loading history...
Comprehensibility introduced by
This function exceeds the maximum number of variables (16/15).
Loading history...
55
56
        output_path = os.path.join(output_directory, output_filename)
57
58
        sdf, txt = self.extract(directory)
59
        mols, data = self.process_sdf(sdf), self.process_txt(txt)
60
61
        LOGGER.debug('Compounds with data extracted: %s', len(data))
62
63
        data = mols.to_frame().join(data)
64
        data = self.drop_inconsistencies(data)
65
66
        y = self.process_targets(data)
0 ignored issues
show
Coding Style Naming introduced by
The name y does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
67
        LOGGER.debug('Compounds with experimental: %s', len(y))
68
69
        data = data.ix[y.index]
70
        data.columns.name = 'targets'
71
        ms, y = data.structure, data.drop('structure', axis=1)
0 ignored issues
show
Coding Style Naming introduced by
The name ms does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
Coding Style Naming introduced by
The name y does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
72
73
        cv = SimThresholdSplit(min_threshold=0.6, block_width=4000, n_jobs=-1).fit(ms)
0 ignored issues
show
Coding Style Naming introduced by
The name cv does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
74
        train, valid, test = cv.split((70, 15, 15))
75
76
        (ms, y, train, valid, test) = contiguous_order((ms, y, train, valid, test), (train, valid, test))
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (105/100).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
Coding Style Naming introduced by
The name ms does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
Coding Style Naming introduced by
The name y does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
77
        splits = (('train', train), ('valid', valid), ('test', test))
78
79
        self.run(ms, y, output_path=output_path, splits=splits)
80
81 1
    def extract(self, directory):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
Coding Style introduced by
This method could be written as a function/class method.

If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example

class Foo:
    def some_method(self, x, y):
        return x + y;

could be written as

class Foo:
    @classmethod
    def some_method(cls, x, y):
        return x + y;
Loading history...
82
        LOGGER.info('Extracting from %s', directory)
83
        with zipfile.ZipFile(os.path.join(directory, 'phys_sdf.zip')) as f:
0 ignored issues
show
Coding Style Naming introduced by
The name f does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
84
            sdf = f.extract('PhysProp.sdf')
85
        with zipfile.ZipFile(os.path.join(directory, 'phys_txt.zip')) as f:
0 ignored issues
show
Coding Style Naming introduced by
The name f does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
86
            txt = f.extract('PhysProp.txt')
87
        return sdf, txt
88
89 1
    def process_sdf(self, path):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
Coding Style introduced by
This method could be written as a function/class method.

If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example

class Foo:
    def some_method(self, x, y):
        return x + y;

could be written as

class Foo:
    @classmethod
    def some_method(cls, x, y):
        return x + y;
Loading history...
90
        LOGGER.info('Processing sdf at %s', path)
91
        mols = io.read_sdf(path, read_props=False).structure
92
        mols.index = mols.apply(lambda m: m.GetProp('CAS'))
93
        mols.index.name = 'cas'
94
        LOGGER.debug('Structures extracted: %s', len(mols))
95
        return mols
96
97 1
    def process_txt(self, path):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
Coding Style introduced by
This method could be written as a function/class method.

If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example

class Foo:
    def some_method(self, x, y):
        return x + y;

could be written as

class Foo:
    @classmethod
    def some_method(cls, x, y):
        return x + y;
Loading history...
98
        LOGGER.info('Processing txt at %s', path)
99
        data = pd.read_table(path, header=None, engine='python').iloc[:, :32]
100
        data.columns = TXT_COLUMNS
101
        data_types = data.columns[[s.endswith('_type') for s in data.columns]]
102
        data[data_types] = data[data_types].fillna('NAN')
103
        data = data.set_index('cas')
104
        return data
105
106 1
    def drop_inconsistencies(self, data):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
Coding Style introduced by
This method could be written as a function/class method.

If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example

class Foo:
    def some_method(self, x, y):
        return x + y;

could be written as

class Foo:
    @classmethod
    def some_method(cls, x, y):
        return x + y;
Loading history...
107
        LOGGER.info('Dropping inconsistent data...')
108
        formula = data.structure.apply(lambda m: m.to_formula())
109
        LOGGER.info('Inconsistent compounds: %s', (formula != data.formula).sum())
110
        data = data[formula == data.formula]
111
        return data
112
113 1
    def process_targets(self, data):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
114
        LOGGER.info('Dropping estimated data...')
115
        data = pd.concat([self.process_logS(data),
116
                          self.process_logP(data),
117
                          self.process_mp(data),
118
                          self.process_bp(data)], axis=1)
119
        LOGGER.info('Dropped compounds: %s', data.isnull().all(axis=1).sum())
120
        data = data[data.notnull().any(axis=1)]
121
        LOGGER.debug('Compounds with experimental activities: %s', len(data))
122
        return data
123
124 1
    def process_logS(self, data):
0 ignored issues
show
Coding Style Naming introduced by
The name process_logS does not conform to the method naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
Coding Style introduced by
This method could be written as a function/class method.

If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example

class Foo:
    def some_method(self, x, y):
        return x + y;

could be written as

class Foo:
    @classmethod
    def some_method(cls, x, y):
        return x + y;
Loading history...
125
        cleaned = pd.DataFrame(index=data.index)
0 ignored issues
show
Unused Code introduced by
The variable cleaned seems to be unused.
Loading history...
126
        S = 0.001 * data.ws / data.mol_weight
0 ignored issues
show
Coding Style Naming introduced by
The name S does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
127
        logS = np.log10(S)
0 ignored issues
show
Coding Style Naming introduced by
The name logS does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
128
        return logS[data.ws_type == 'EXP']
129
130 1
    def process_logP(self, data):
0 ignored issues
show
Coding Style Naming introduced by
The name process_logP does not conform to the method naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
Coding Style introduced by
This method could be written as a function/class method.

If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example

class Foo:
    def some_method(self, x, y):
        return x + y;

could be written as

class Foo:
    @classmethod
    def some_method(cls, x, y):
        return x + y;
Loading history...
131
        logP = data.logp[data.logp_type == 'EXP']
0 ignored issues
show
Coding Style Naming introduced by
The name logP does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
132
        return logP[logP > -10]
133
134 1
    def process_mp(self, data):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
135
        return data.mp.apply(self.fix_temp)
136
137 1
    def process_bp(self, data):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
138
        return data.bp.apply(self.fix_temp)
139
140 1
    @staticmethod
141 1
    def fix_temp(s, mean_range=5):
0 ignored issues
show
Coding Style Naming introduced by
The name s does not conform to the argument naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
142
        try:
143
            return float(s)
144
        except ValueError:
145
            if '<' in s or '>' in s:
146
                return np.nan
147
            s = s.strip(' dec')
148
            s = s.strip(' sub')
149
            if '-' in s and mean_range:
150
                rng = [float(n) for n in s.split('-')]
151
                if len(rng) > 2:
152
                    return np.nan
153
                if np.abs(rng[1] - rng[0]) < mean_range:
154
                    return (rng[0] + rng[1])/2
155
            try:
156
                return float(s)
157
            except ValueError:
158
                return np.nan
159
160
161
162 1
if __name__ == '__main__':
163
    logging.basicConfig(level=logging.INFO)
164
    LOGGER.info('Converting PhysProp Dataset...')
165
    PhysPropConverter.convert()
166