Completed
Push — master ( 4d243d...e4a84f )
by Rich
01:28
created

PhysPropConverter.fix_temp()   C

Complexity

Conditions 10

Size

Total Lines 19

Duplication

Lines 0
Ratio 0 %

Importance

Changes 1
Bugs 0 Features 1
Metric Value
c 1
b 0
f 1
dl 0
loc 19
rs 6
cc 10

How to fix   Complexity   

Complexity

Complex classes like PhysPropConverter.fix_temp() often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

1
#! /usr/bin/env python
0 ignored issues
show
Coding Style introduced by
This module should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
2
#
3
# Copyright (C) 2016 Rich Lewis <[email protected]>
4
# License: 3-clause BSD
5
6
import os
7
import zipfile
8
import logging
9
logger = logging.getLogger(__name__)
0 ignored issues
show
Coding Style Naming introduced by
The name logger does not conform to the constant naming conventions ((([A-Z_][A-Z0-9_]*)|(__.*__))$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
10
11
import pandas as pd
0 ignored issues
show
Configuration introduced by
The import pandas could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
12
import numpy as np
0 ignored issues
show
Configuration introduced by
The import numpy could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
13
14
15
from ... import io
16
from ... import standardizers
0 ignored issues
show
Unused Code introduced by
The import standardizers seems to be unused.
Loading history...
17
from .base import Converter
18
19
TXT_COLUMNS = [l.lower() for l in """CAS
20
Formula
21
Mol_Weight
22
Chemical_Name
23
WS
24
WS_temp
25
WS_type
26
WS_reference
27
LogP
28
LogP_temp
29
LogP_type
30
LogP_reference
31
VP
32
VP_temp
33
VP_type
34
VP_reference
35
DC_pKa
36
DC_temp
37
DC_type
38
DC_reference
39
henry_law Constant
40
HL_temp
41
HL_type
42
HL_reference
43
OH
44
OH_temp
45
OH_type
46
OH_reference
47
BP_pressure
48
MP
49
BP
50
FP""".split('\n')]
51
52
class PhysPropConverter(Converter):
0 ignored issues
show
Coding Style introduced by
This class should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
53
54
    def __init__(self, directory, output_directory, output_filename='physprop.h5'):
0 ignored issues
show
Bug introduced by
The __init__ method of the super-class Converter is not called.

It is generally advisable to initialize the super-class by calling its __init__ method:

class SomeParent:
    def __init__(self):
        self.x = 1

class SomeChild(SomeParent):
    def __init__(self):
        # Initialize the super class
        SomeParent.__init__(self)
Loading history...
55
56
        output_path = os.path.join(output_directory, output_filename)
57
58
        sdf, txt = self.extract(directory)
59
        mols, data = self.process_sdf(sdf), self.process_txt(txt)
60
61
        logger.debug('Compounds with data extracted: %s', len(data))
62
63
        data = mols.to_frame().join(data)
64
        data = self.drop_inconsistencies(data)
65
66
        data = self.standardize(data)
67
68
        data = self.filter(data)
69
70
        y = self.process_targets(data)
0 ignored issues
show
Coding Style Naming introduced by
The name y does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
71
72
        logger.debug('Compounds with experimental: %s', len(y))
73
        ms = data.structure[y.index]
0 ignored issues
show
Coding Style Naming introduced by
The name ms does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
74
        self.run(ms, y, output_path=output_path, contiguous=True)
75
76
    def extract(self, directory):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
Coding Style introduced by
This method could be written as a function/class method.

If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example

class Foo:
    def some_method(self, x, y):
        return x + y;

could be written as

class Foo:
    @classmethod
    def some_method(cls, x, y):
        return x + y;
Loading history...
77
        logger.info('Extracting from %s', directory)
78
        with zipfile.ZipFile(os.path.join(directory, 'phys_sdf.zip')) as f:
0 ignored issues
show
Coding Style Naming introduced by
The name f does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
79
            sdf = f.extract('PhysProp.sdf')
80
        with zipfile.ZipFile(os.path.join(directory, 'phys_txt.zip')) as f:
0 ignored issues
show
Coding Style Naming introduced by
The name f does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
81
            txt = f.extract('PhysProp.txt')
82
        return sdf, txt
83
84
    def process_sdf(self, path):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
Coding Style introduced by
This method could be written as a function/class method.

If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example

class Foo:
    def some_method(self, x, y):
        return x + y;

could be written as

class Foo:
    @classmethod
    def some_method(cls, x, y):
        return x + y;
Loading history...
85
        logger.info('Processing sdf at %s', path)
86
        mols = io.read_sdf(path, read_props=False).structure
87
        mols.index = mols.apply(lambda m: m.GetProp('CAS'))
88
        mols.index.name = 'cas'
89
        logger.debug('Structures extracted: %s', len(mols))
90
        return mols
91
92
    def process_txt(self, path):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
Coding Style introduced by
This method could be written as a function/class method.

If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example

class Foo:
    def some_method(self, x, y):
        return x + y;

could be written as

class Foo:
    @classmethod
    def some_method(cls, x, y):
        return x + y;
Loading history...
93
        logger.info('Processing txt at %s', path)
94
        data = pd.read_table(path, header=None, engine='python').iloc[:, :32]
95
        data.columns = TXT_COLUMNS
96
        data_types = data.columns[[s.endswith('_type') for s in data.columns]]
97
        data[data_types] = data[data_types].fillna('NAN')
98
        data = data.set_index('cas')
99
        return data
100
101
    def drop_inconsistencies(self, data):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
Coding Style introduced by
This method could be written as a function/class method.

If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example

class Foo:
    def some_method(self, x, y):
        return x + y;

could be written as

class Foo:
    @classmethod
    def some_method(cls, x, y):
        return x + y;
Loading history...
102
        logger.info('Dropping inconsistent data...')
103
        formula = data.structure.apply(lambda m: m.to_formula())
104
        logger.info('Inconsistent compounds: %s', (formula != data.formula).sum())
105
        data = data[formula == data.formula]
106
        return data
107
108
    def process_targets(self, data):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
109
        logger.info('Dropping estimated data...')
110
        data = pd.concat([self.process_logS(data),
111
                          self.process_logP(data),
112
                          self.process_mp(data),
113
                          self.process_bp(data)], axis=1)
114
        logger.info('Dropped compounds: %s', data.isnull().all(axis=1).sum())
115
        data = data[data.notnull().any(axis=1)]
116
        logger.debug('Compounds with experimental activities: %s', len(data))
117
        return data
118
119
    def process_logS(self, data):
0 ignored issues
show
Coding Style Naming introduced by
The name process_logS does not conform to the method naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
Coding Style introduced by
This method could be written as a function/class method.

If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example

class Foo:
    def some_method(self, x, y):
        return x + y;

could be written as

class Foo:
    @classmethod
    def some_method(cls, x, y):
        return x + y;
Loading history...
120
        cleaned = pd.DataFrame(index=data.index)
0 ignored issues
show
Unused Code introduced by
The variable cleaned seems to be unused.
Loading history...
121
        S = 0.001 * data.ws / data.mol_weight
0 ignored issues
show
Coding Style Naming introduced by
The name S does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
122
        logS = np.log10(S)
0 ignored issues
show
Coding Style Naming introduced by
The name logS does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
123
        return logS[data.ws_type == 'EXP']
124
125
    def process_logP(self, data):
0 ignored issues
show
Coding Style Naming introduced by
The name process_logP does not conform to the method naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
Coding Style introduced by
This method could be written as a function/class method.

If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example

class Foo:
    def some_method(self, x, y):
        return x + y;

could be written as

class Foo:
    @classmethod
    def some_method(cls, x, y):
        return x + y;
Loading history...
126
        logP = data.logp[data.logp_type == 'EXP']
0 ignored issues
show
Coding Style Naming introduced by
The name logP does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
127
        return logP[logP > -10]
128
129
    def process_mp(self, data):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
130
        return data.mp.apply(self.fix_temp)
131
132
    def process_bp(self, data):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
133
        return data.bp.apply(self.fix_temp)
134
135
    @staticmethod
136
    def fix_temp(s, mean_range=5):
0 ignored issues
show
Coding Style Naming introduced by
The name s does not conform to the argument naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
137
        try:
138
            return float(s)
139
        except ValueError:
140
            if '<' in s or '>' in s:
141
                return np.nan
142
            s = s.strip(' dec')
143
            s = s.strip(' sub')
144
            if '-' in s and mean_range:
145
                rng = [float(n) for n in s.split('-')]
146
                if len(rng) > 2:
147
                    return np.nan
148
                if np.abs(rng[1] - rng[0]) < mean_range:
149
                    return (rng[0] + rng[1])/2
150
            try:
151
                return float(s)
152
            except ValueError:
153
                return np.nan
154
155
156
157
if __name__ == '__main__':
158
    logging.basicConfig(level=logging.INFO)
159
    PhysPropConverter.convert()
160