NMRShiftDB2Converter.get_spectra()   C
last analyzed

Complexity

Conditions 7

Size

Total Lines 31

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 1
CRAP Score 43.8155

Importance

Changes 1
Bugs 0 Features 1
Metric Value
c 1
b 0
f 1
dl 0
loc 31
ccs 1
cts 11
cp 0.0909
rs 5.5
cc 7
crap 43.8155

2 Methods

Rating   Name   Duplication   Size   Complexity  
A NMRShiftDB2Converter.is_spectrum() 0 2 2
A NMRShiftDB2Converter.index_pair() 0 2 1
1
#! /usr/bin/env python
0 ignored issues
show
Coding Style introduced by
This module should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
2
#
3
# Copyright (C) 2016 Rich Lewis <[email protected]>
4
# License: 3-clause BSD
5
6 1
import os
7 1
import logging
8 1
import itertools
9 1
from collections import defaultdict
10
11 1
import pandas as pd
0 ignored issues
show
Configuration introduced by
The import pandas could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
12 1
import numpy as np
0 ignored issues
show
Configuration introduced by
The import numpy could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
13 1
from sklearn import metrics
0 ignored issues
show
Configuration introduced by
The import sklearn could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
14
15 1
from .base import Converter, default_pipeline, contiguous_order
16 1
from ... import io
17 1
from ... import utils
18 1
from ...cross_validation import SimThresholdSplit
19
20 1
LOGGER = logging.getLogger(__file__)
21
22 1
class NMRShiftDB2Converter(Converter):
0 ignored issues
show
Coding Style introduced by
This class should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
23
24 1
    def __init__(self, directory, output_directory, output_filename='nmrshiftdb2.h5'):
0 ignored issues
show
Bug introduced by
The __init__ method of the super-class Converter is not called.

It is generally advisable to initialize the super-class by calling its __init__ method:

class SomeParent:
    def __init__(self):
        self.x = 1

class SomeChild(SomeParent):
    def __init__(self):
        # Initialize the super class
        SomeParent.__init__(self)
Loading history...
Comprehensibility introduced by
This function exceeds the maximum number of variables (17/15).
Loading history...
25
26
        output_path = os.path.join(output_directory, output_filename)
27
        input_path = os.path.join(directory, 'nmrshiftdb2.sdf')
28
        data = self.parse_data(input_path)
29
30
        ys = self.get_spectra(data)
0 ignored issues
show
Coding Style Naming introduced by
The name ys does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
31
        ys = self.process_spectra(ys)
0 ignored issues
show
Coding Style Naming introduced by
The name ys does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
32
        ys = self.combine_duplicates(ys)
0 ignored issues
show
Coding Style Naming introduced by
The name ys does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
33
        self.log_dists(ys)
34
        self.log_duplicates(ys)
35
        ys = self.squash_duplicates(ys)
0 ignored issues
show
Coding Style Naming introduced by
The name ys does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
36
37
        c13s = self.to_frame(ys.loc[ys['13c'].notnull(), '13c'])
38
        data = data[['structure']].join(c13s, how='right')
39
40
        ms, y = data.structure, data.drop('structure', axis=1)
0 ignored issues
show
Coding Style Naming introduced by
The name ms does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
Coding Style Naming introduced by
The name y does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
41
        pipeline = default_pipeline()
42
        ms, y = pipeline.transform_filter(ms, y)
0 ignored issues
show
Coding Style Naming introduced by
The name ms does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
Coding Style Naming introduced by
The name y does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
43
        y.columns.name = 'shifts'
44
45
        cv = SimThresholdSplit(min_threshold=0.6, block_width=4000, n_jobs=-1).fit(ms)
0 ignored issues
show
Coding Style Naming introduced by
The name cv does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
46
        train, valid, test = cv.split((70, 15, 15))
47
48
        (ms, y, train, valid, test) = contiguous_order((ms, y, train, valid, test), (train, valid, test))
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (105/100).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
Coding Style Naming introduced by
The name ms does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
Coding Style Naming introduced by
The name y does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
49
        splits = (('train', train), ('valid', valid), ('test', test))
50
51
        self.run(ms, y, output_path=output_path, splits=splits)
52
53 1
    @staticmethod
54
    def parse_data(filepath):
55
56
        """ Reads the raw datafile. """
57
58
        LOGGER.info('Reading file: %s', filepath)
59
        data = io.read_sdf(filepath, removeHs=False, warn_bad_mol=False)
60
        data.index = data['nmrshiftdb2 ID'].astype(int)
61
        data.index.name = 'nmrshiftdb2_id'
62
        data.columns = data.columns.to_series().apply(utils.free_to_snail)
63
        data = data.sort_index()
64
        LOGGER.info('Read %s molecules.', len(data))
65
        return data
66
67 1
    @staticmethod
68
    def get_spectra(data):
69
70
        """ Retrieves spectra from raw data. """
71
72
        LOGGER.info('Retrieving spectra from raw data...')
73
        isotopes = [
74
            '1h',
75
            '11b',
76
            '13c',
77
            '15n',
78
            '17o',
79
            '19f',
80
            '29si',
81
            '31p',
82
            '33s',
83
            '73ge',
84
            '195pt'
85
        ]
86
87
        def is_spectrum(col_name, ele='c'):
0 ignored issues
show
Coding Style introduced by
This function should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
Unused Code introduced by
The argument ele seems to be unused.
Loading history...
88
            return any(isotope in col_name for isotope in isotopes)
89
90
        spectrum_cols = [c for c in data if is_spectrum(c)]
91
        data = data[spectrum_cols]
92
93
        def index_pair(s):
0 ignored issues
show
Coding Style Naming introduced by
The name s does not conform to the argument naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
Coding Style introduced by
This function should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
94
            return s[0], int(s[1])
95
96
        data.columns = pd.MultiIndex.from_tuples([index_pair(i.split('_')[1:]) for i in data.columns])
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (102/100).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
97
        return data
98
99 1
    @staticmethod
100
    def process_spectra(data):
101
102
        """ Turn the string representations found in sdf file into a dictionary. """
103
104
        def spectrum_dict(spectrum_string):
0 ignored issues
show
Coding Style introduced by
This function should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
105
            if not isinstance(spectrum_string, str):
106
                return np.nan # no spectra are still nan
107
            if spectrum_string == '':
108
                return np.nan # empty spectra are nan
109
            sigs = spectrum_string.strip().strip('|').strip().split('|') # extract signals
110
            sig_tup = [tuple(s.split(';')) for s in sigs] # take tuples as (signal, coupling, atom)
111
            return {int(s[2]): float(s[0]) for s in sig_tup} # make spectrum a dictionary of atom to signal
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (107/100).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
112
113
        return data.applymap(spectrum_dict)
114
115 1
    @staticmethod
116
    def combine_duplicates(data):
117
118
        """ Collect duplicate spectra into one dictionary. All shifts are collected into lists. """
119
120
        def aggregate_dicts(ds):
0 ignored issues
show
Coding Style Naming introduced by
The name ds does not conform to the argument naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
Coding Style introduced by
This function should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
121
            res = defaultdict(list)
122
            for d in ds:
0 ignored issues
show
Coding Style Naming introduced by
The name d does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
123
                if not isinstance(d, dict): continue
0 ignored issues
show
Coding Style introduced by
More than one statement on a single line
Loading history...
124
                for k, v in d.items():
0 ignored issues
show
Coding Style Naming introduced by
The name v does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
125
                    res[k].append(v)
126
            return dict(res) if len(res) else np.nan
127
128
        return data.groupby(level=0, axis=1).apply(lambda s: s.apply(aggregate_dicts, axis=1))
129
130 1
    @staticmethod
131
    def squash_duplicates(data):
132
133
        """ Take the mean of all the duplicates.  This is where we could do a bit more checking. """
134
135
        def squash(d):
0 ignored issues
show
Coding Style Naming introduced by
The name d does not conform to the argument naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
Coding Style introduced by
This function should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
136
            if not isinstance(d, dict):
137
                return np.nan
138
            else:
139
                return {k: np.mean(v) for k, v in d.items()}
140
141
        return data.applymap(squash)
142
143 1
    @staticmethod
144
    def to_frame(data):
145
146
        """ Convert a series of dictionaries to a dataframe. """
147
        res = pd.DataFrame(data.tolist(), index=data.index)
148
        res.columns.name = 'atom_idx'
149
        return res
150
151 1
    @staticmethod
152 1
    def extract_duplicates(data, kind='13c'):
153
154
        """ Get all 13c duplicates.  """
155
156
        def is_duplicate(ele):
0 ignored issues
show
Coding Style introduced by
This function should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
157
            if not isinstance(ele, dict):
158
                return False
159
            else:
160
                return len(list(ele.values())[0]) > 1
161
162
        return data.loc[data[kind].apply(is_duplicate), kind]
163
164 1
    @staticmethod
165
    def log_dists(data):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
166
167
        def n_spect(ele):
0 ignored issues
show
Coding Style introduced by
This function should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
168
            return isinstance(ele, dict)
169
170
        def n_shifts(ele):
0 ignored issues
show
Coding Style introduced by
This function should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
171
            return len(ele) if isinstance(ele, dict) else 0
172
173
        def log_message(func):
0 ignored issues
show
Coding Style introduced by
This function should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
174
            return '  '.join('{k}: {v}'.format(k=k, v=v) for k, v in data.applymap(func).sum().to_dict().items())
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (113/100).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
175
176
        LOGGER.info('Number of spectra: %s', log_message(n_spect))
177
        LOGGER.info('Extracted shifts: %s', log_message(n_shifts))
178
179
180 1
    def log_duplicates(self, data):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
181
182
        for kind in '1h', '13c':
183
            dups = self.extract_duplicates(data, kind)
184
            LOGGER.info('Number of duplicate %s spectra: %s', kind, len(dups))
185
            res = pd.DataFrame(sum((list(itertools.combinations(l, 2)) for s in dups for k, l in s.items()), []))
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (113/100).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
186
            LOGGER.info('Number of duplicate %s pairs: %f', kind, len(res))
187
            LOGGER.info('MAE for duplicate %s: %.4f', kind, metrics.mean_absolute_error(res[0], res[1]))
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (104/100).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
188
            LOGGER.info('MSE for duplicate %s: %.4f', kind, metrics.mean_squared_error(res[0], res[1]))
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (103/100).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
189
            LOGGER.info('r2 for duplicate %s: %.4f', kind, metrics.r2_score(res[0], res[1]))
190
191
192 1
if __name__ == '__main__':
193
    logging.basicConfig(level=logging.DEBUG)
194
    LOGGER.info('Converting NMRShiftDB2 Dataset...')
195
    NMRShiftDB2Converter.convert()
0 ignored issues
show
Coding Style introduced by
Final newline missing
Loading history...