Completed
Push — master ( 01edc4...2b1c29 )
by Rich
01:29
created

NMRShiftDB2Converter   B

Complexity

Total Complexity 41

Size/Duplication

Total Lines 158
Duplicated Lines 0 %

Importance

Changes 1
Bugs 0 Features 1
Metric Value
wmc 41
c 1
b 0
f 1
dl 0
loc 158
rs 8.2769

20 Methods

Rating   Name   Duplication   Size   Complexity  
B log_dists() 0 13 6
A is_spectrum() 0 2 2
A __init__() 0 22 1
A index_pair() 0 2 1
A is_duplicate() 0 5 2
A plot_duplicates() 0 4 1
B combine_duplicates() 0 13 7
A squash_duplicates() 0 11 4
A log_duplicates() 0 10 4
A log_message() 0 2 2
B process_spectra() 0 14 6
B aggregate_dicts() 0 7 5
A squash() 0 5 3
A parse_data() 0 12 1
B spectrum_dict() 0 8 5
A to_frame() 0 6 1
A n_shifts() 0 2 2
C get_spectra() 0 30 7
A n_spect() 0 2 1
A extract_duplicates() 0 11 3

How to fix   Complexity   

Complex Class

Complex classes like NMRShiftDB2Converter often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

1
#! /usr/bin/env python
0 ignored issues
show
Coding Style introduced by
This module should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
2
#
3
# Copyright (C) 2016 Rich Lewis <[email protected]>
4
# License: 3-clause BSD
5
6
import os
7
import logging
8
import itertools
9
from collections import defaultdict
10
11
import pandas as pd
0 ignored issues
show
Configuration introduced by
The import pandas could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
12
import numpy as np
0 ignored issues
show
Configuration introduced by
The import numpy could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
13
from sklearn import metrics
0 ignored issues
show
Configuration introduced by
The import sklearn could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
14
15
from .base import Converter
0 ignored issues
show
Configuration introduced by
Unable to import 'base' (invalid syntax (<string>, line 46))

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
16
from ... import io
17
from ... import utils
18
19
LOGGER = logging.getLogger(__file__)
20
21
class NMRShiftDB2Converter(Converter):
0 ignored issues
show
Coding Style introduced by
This class should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
22
23
    def __init__(self, directory, output_directory, output_filename='nmrshiftdb2.h5'):
24
25
        output_path = os.path.join(output_directory, output_filename)
26
        input_path = os.path.join(directory, 'nmrshiftdb2.sdf')
27
        data = self.parse_data(input_path)
28
29
        ys = self.get_spectra(data)
0 ignored issues
show
Coding Style Naming introduced by
The name ys does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
30
        ys = self.process_spectra(ys)
0 ignored issues
show
Coding Style Naming introduced by
The name ys does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
31
        ys = self.combine_duplicates(ys)
0 ignored issues
show
Coding Style Naming introduced by
The name ys does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
32
        self.log_dists(ys)
33
        self.log_duplicates(ys)
34
        ys = self.squash_duplicates(ys)
0 ignored issues
show
Coding Style Naming introduced by
The name ys does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
35
36
        c13s = self.to_frame(ys.loc[ys['13c'].notnull(), '13c'])
37
        data = data[['structure']].join(c13s, how='right')
38
39
        data = self.standardize(data)
0 ignored issues
show
Bug introduced by
The Instance of NMRShiftDB2Converter does not seem to have a member named standardize.

This check looks for calls to members that are non-existent. These calls will fail.

The member could have been renamed or removed.

Loading history...
40
        data = self.filter(data)
0 ignored issues
show
Bug introduced by
The Instance of NMRShiftDB2Converter does not seem to have a member named filter.

This check looks for calls to members that are non-existent. These calls will fail.

The member could have been renamed or removed.

Loading history...
41
        data = self.optimize(data)
0 ignored issues
show
Bug introduced by
The Instance of NMRShiftDB2Converter does not seem to have a member named optimize.

This check looks for calls to members that are non-existent. These calls will fail.

The member could have been renamed or removed.

Loading history...
42
43
        ms, y = data.structure, data.drop('structure', axis=1)
0 ignored issues
show
Coding Style Naming introduced by
The name ms does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
Coding Style Naming introduced by
The name y does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
44
        self.run(ms, y, output_path=output_path)
0 ignored issues
show
Bug introduced by
The Instance of NMRShiftDB2Converter does not seem to have a member named run.

This check looks for calls to members that are non-existent. These calls will fail.

The member could have been renamed or removed.

Loading history...
45
46
    def parse_data(self, filepath):
0 ignored issues
show
Coding Style introduced by
This method could be written as a function/class method.

If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example

class Foo:
    def some_method(self, x, y):
        return x + y;

could be written as

class Foo:
    @classmethod
    def some_method(cls, x, y):
        return x + y;
Loading history...
47
48
        """ Reads the raw datafile. """
49
50
        LOGGER.info('Reading file: %s', filepath)
51
        data = io.read_sdf(filepath, removeHs=False, warn_bad_mol=False)
52
        data.index = data['nmrshiftdb2 ID'].astype(int)
53
        data.index.name = 'nmrshiftdb2_id'
54
        data.columns = data.columns.to_series().apply(utils.free_to_snail)
55
        data = data.sort_index()
56
        LOGGER.info('Read %s molecules.', len(data))
57
        return data
58
59
    def get_spectra(self, data):
0 ignored issues
show
Coding Style introduced by
This method could be written as a function/class method.

If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example

class Foo:
    def some_method(self, x, y):
        return x + y;

could be written as

class Foo:
    @classmethod
    def some_method(cls, x, y):
        return x + y;
Loading history...
60
61
        """ Retrieves spectra from raw data. """
62
63
        LOGGER.info('Retrieving spectra from raw data...')
64
        isotopes = [
65
            '1h',
66
            '11b',
67
            '13c',
68
            '15n',
69
            '17o',
70
            '19f',
71
            '29si',
72
            '31p',
73
            '33s',
74
            '73ge',
75
            '195pt'
76
        ]
77
78
        def is_spectrum(col_name, ele='c'):
0 ignored issues
show
Coding Style introduced by
This function should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
Unused Code introduced by
The argument ele seems to be unused.
Loading history...
79
            return any(isotope in col_name for isotope in isotopes)
80
81
        spectrum_cols = [c for c in data if is_spectrum(c)]
82
        data = data[spectrum_cols]
83
84
        def index_pair(s):
0 ignored issues
show
Coding Style Naming introduced by
The name s does not conform to the argument naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
Coding Style introduced by
This function should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
85
            return s[0], int(s[1])
86
87
        data.columns = pd.MultiIndex.from_tuples([index_pair(i.split('_')[1:]) for i in data.columns])
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (102/100).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
88
        return data
89
90
    def process_spectra(self, data):
0 ignored issues
show
Coding Style introduced by
This method could be written as a function/class method.

If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example

class Foo:
    def some_method(self, x, y):
        return x + y;

could be written as

class Foo:
    @classmethod
    def some_method(cls, x, y):
        return x + y;
Loading history...
91
92
        """ Turn the string representations found in sdf file into a dictionary. """
93
94
        def spectrum_dict(spectrum_string):
0 ignored issues
show
Coding Style introduced by
This function should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
95
            if not isinstance(spectrum_string, str):
96
                return np.nan # no spectra are still nan
97
            if spectrum_string == '':
98
                return np.nan # empty spectra are nan
99
            sigs = spectrum_string.strip().strip('|').strip().split('|') # extract signals
100
            sig_tup = [tuple(s.split(';')) for s in sigs] # take tuples as (signal, coupling, atom)
101
            return {int(s[2]): float(s[0]) for s in sig_tup} # make spectrum a dictionary of atom to signal
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (107/100).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
102
103
        return data.applymap(spectrum_dict)
104
105
    def combine_duplicates(self, data):
0 ignored issues
show
Coding Style introduced by
This method could be written as a function/class method.

If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example

class Foo:
    def some_method(self, x, y):
        return x + y;

could be written as

class Foo:
    @classmethod
    def some_method(cls, x, y):
        return x + y;
Loading history...
106
107
        """ Collect duplicate spectra into one dictionary. All shifts are collected into lists. """
108
109
        def aggregate_dicts(ds):
0 ignored issues
show
Coding Style Naming introduced by
The name ds does not conform to the argument naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
Coding Style introduced by
This function should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
110
            res = defaultdict(list)
111
            for d in ds:
0 ignored issues
show
Coding Style Naming introduced by
The name d does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
112
                if not isinstance(d, dict): continue
0 ignored issues
show
Coding Style introduced by
More than one statement on a single line
Loading history...
113
                for k, v in d.items():
0 ignored issues
show
Coding Style Naming introduced by
The name v does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
114
                    res[k].append(v)
115
            return dict(res) if len(res) else np.nan
116
117
        return data.groupby(level=0, axis=1).apply(lambda s: s.apply(aggregate_dicts, axis=1))
118
119
    def squash_duplicates(self, data):
0 ignored issues
show
Coding Style introduced by
This method could be written as a function/class method.

If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example

class Foo:
    def some_method(self, x, y):
        return x + y;

could be written as

class Foo:
    @classmethod
    def some_method(cls, x, y):
        return x + y;
Loading history...
120
121
        """ Take the mean of all the duplicates.  This is where we could do a bit more checking. """
122
123
        def squash(d):
0 ignored issues
show
Coding Style Naming introduced by
The name d does not conform to the argument naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
Coding Style introduced by
This function should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
124
            if not isinstance(d, dict):
125
                return np.nan
126
            else:
127
                return {k: np.mean(v) for k, v in d.items()}
128
129
        return data.applymap(squash)
130
131
    def to_frame(self, data):
0 ignored issues
show
Coding Style introduced by
This method could be written as a function/class method.

If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example

class Foo:
    def some_method(self, x, y):
        return x + y;

could be written as

class Foo:
    @classmethod
    def some_method(cls, x, y):
        return x + y;
Loading history...
132
133
        """ Convert a series of dictionaries to a dataframe. """
134
        res = pd.DataFrame(data.tolist(), index=data.index)
135
        res.columns.name = 'atom_idx'
136
        return res
137
138
    def extract_duplicates(self, data, kind='13c'):
0 ignored issues
show
Coding Style introduced by
This method could be written as a function/class method.

If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example

class Foo:
    def some_method(self, x, y):
        return x + y;

could be written as

class Foo:
    @classmethod
    def some_method(cls, x, y):
        return x + y;
Loading history...
139
140
        """ Get all 13c duplicates.  """
141
142
        def is_duplicate(ele):
0 ignored issues
show
Coding Style introduced by
This function should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
143
            if not isinstance(ele, dict):
144
                return False
145
            else:
146
                return len(list(ele.values())[0]) > 1
147
148
        return data.loc[data[kind].apply(is_duplicate), kind]
149
150
    def log_dists(self, data):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
Coding Style introduced by
This method could be written as a function/class method.

If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example

class Foo:
    def some_method(self, x, y):
        return x + y;

could be written as

class Foo:
    @classmethod
    def some_method(cls, x, y):
        return x + y;
Loading history...
151
152
        def n_spect(ele):
0 ignored issues
show
Coding Style introduced by
This function should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
153
            return isinstance(ele, dict)
154
155
        def n_shifts(ele):
0 ignored issues
show
Coding Style introduced by
This function should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
156
            return len(ele) if isinstance(ele, dict) else 0
157
158
        def log_message(func):
0 ignored issues
show
Coding Style introduced by
This function should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
159
            return '  '.join('{k}: {v}'.format(k=k, v=v) for k, v in data.applymap(func).sum().to_dict().items())
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (113/100).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
160
161
        LOGGER.info('Number of spectra: %s', log_message(n_spect))
162
        LOGGER.info('Extracted shifts: %s', log_message(n_shifts))
163
164
    def log_duplicates(self, data):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
165
166
        for kind in '1h', '13c':
167
            dups = self.extract_duplicates(data, kind)
168
            LOGGER.info('Number of duplicate %s spectra: %s', kind, len(dups))
169
            res = pd.DataFrame(sum((list(itertools.combinations(l, 2)) for s in dups for k, l in s.items()), []))
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (113/100).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
170
            LOGGER.info('Number of duplicate %s pairs: %f', kind, len(res))
171
            LOGGER.info('MAE for duplicate %s: %.4f', kind, metrics.mean_absolute_error(res[0], res[1]))
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (104/100).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
172
            LOGGER.info('MSE for duplicate %s: %.4f', kind, metrics.mean_squared_error(res[0], res[1]))
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (103/100).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
173
            LOGGER.info('r2 for duplicate %s: %.4f', kind, metrics.r2_score(res[0], res[1]))
174
175
    def plot_duplicates(self, data):
176
177
        """ Plot the duplicates """
178
        pass
179
180
181
182
if __name__ == '__main__':
183
    logging.basicConfig(level=logging.DEBUG)
184
    LOGGER.info('Converting NMRShiftDB2 Dataset...')
185
    NMRShiftDB2Converter.convert()
0 ignored issues
show
Coding Style introduced by
Final newline missing
Loading history...
Bug introduced by
The Class NMRShiftDB2Converter does not seem to have a member named convert.

This check looks for calls to members that are non-existent. These calls will fail.

The member could have been renamed or removed.

Loading history...