1 | #! /usr/bin/env python |
||
0 ignored issues
–
show
|
|||
2 | # |
||
3 | # Copyright (C) 2016 Rich Lewis <[email protected]> |
||
4 | # License: 3-clause BSD |
||
5 | |||
6 | 1 | import os |
|
7 | 1 | import logging |
|
8 | 1 | import itertools |
|
9 | 1 | from collections import defaultdict |
|
10 | |||
11 | 1 | import pandas as pd |
|
0 ignored issues
–
show
The import
pandas could not be resolved.
This can be caused by one of the following: 1. Missing DependenciesThis error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands. # .scrutinizer.yml
before_commands:
- sudo pip install abc # Python2
- sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use
the command for the correct version.
2. Missing __init__.py filesThis error could also result from missing ![]() |
|||
12 | 1 | import numpy as np |
|
0 ignored issues
–
show
The import
numpy could not be resolved.
This can be caused by one of the following: 1. Missing DependenciesThis error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands. # .scrutinizer.yml
before_commands:
- sudo pip install abc # Python2
- sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use
the command for the correct version.
2. Missing __init__.py filesThis error could also result from missing ![]() |
|||
13 | 1 | from sklearn import metrics |
|
0 ignored issues
–
show
The import
sklearn could not be resolved.
This can be caused by one of the following: 1. Missing DependenciesThis error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands. # .scrutinizer.yml
before_commands:
- sudo pip install abc # Python2
- sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use
the command for the correct version.
2. Missing __init__.py filesThis error could also result from missing ![]() |
|||
14 | |||
15 | 1 | from .base import Converter, default_pipeline, contiguous_order |
|
16 | 1 | from ... import io |
|
17 | 1 | from ... import utils |
|
18 | 1 | from ...cross_validation import SimThresholdSplit |
|
19 | |||
20 | 1 | LOGGER = logging.getLogger(__file__) |
|
21 | |||
22 | 1 | class NMRShiftDB2Converter(Converter): |
|
0 ignored issues
–
show
This class should have a docstring.
The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods: class SomeClass:
def some_method(self):
"""Do x and return foo."""
If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions. ![]() |
|||
23 | |||
24 | 1 | def __init__(self, directory, output_directory, output_filename='nmrshiftdb2.h5'): |
|
0 ignored issues
–
show
The
__init__ method of the super-class Converter is not called.
It is generally advisable to initialize the super-class by calling its class SomeParent:
def __init__(self):
self.x = 1
class SomeChild(SomeParent):
def __init__(self):
# Initialize the super class
SomeParent.__init__(self)
![]() |
|||
25 | |||
26 | output_path = os.path.join(output_directory, output_filename) |
||
27 | input_path = os.path.join(directory, 'nmrshiftdb2.sdf') |
||
28 | data = self.parse_data(input_path) |
||
29 | |||
30 | ys = self.get_spectra(data) |
||
0 ignored issues
–
show
The name
ys does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$ ).
This check looks for invalid names for a range of different identifiers. You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements. If your project includes a Pylint configuration file, the settings contained in that file take precedence. To find out more about Pylint, please refer to their site. ![]() |
|||
31 | ys = self.process_spectra(ys) |
||
0 ignored issues
–
show
The name
ys does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$ ).
This check looks for invalid names for a range of different identifiers. You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements. If your project includes a Pylint configuration file, the settings contained in that file take precedence. To find out more about Pylint, please refer to their site. ![]() |
|||
32 | ys = self.combine_duplicates(ys) |
||
0 ignored issues
–
show
The name
ys does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$ ).
This check looks for invalid names for a range of different identifiers. You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements. If your project includes a Pylint configuration file, the settings contained in that file take precedence. To find out more about Pylint, please refer to their site. ![]() |
|||
33 | self.log_dists(ys) |
||
34 | self.log_duplicates(ys) |
||
35 | ys = self.squash_duplicates(ys) |
||
0 ignored issues
–
show
The name
ys does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$ ).
This check looks for invalid names for a range of different identifiers. You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements. If your project includes a Pylint configuration file, the settings contained in that file take precedence. To find out more about Pylint, please refer to their site. ![]() |
|||
36 | |||
37 | c13s = self.to_frame(ys.loc[ys['13c'].notnull(), '13c']) |
||
38 | data = data[['structure']].join(c13s, how='right') |
||
39 | |||
40 | ms, y = data.structure, data.drop('structure', axis=1) |
||
0 ignored issues
–
show
The name
ms does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$ ).
This check looks for invalid names for a range of different identifiers. You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements. If your project includes a Pylint configuration file, the settings contained in that file take precedence. To find out more about Pylint, please refer to their site. ![]() The name
y does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$ ).
This check looks for invalid names for a range of different identifiers. You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements. If your project includes a Pylint configuration file, the settings contained in that file take precedence. To find out more about Pylint, please refer to their site. ![]() |
|||
41 | pipeline = default_pipeline() |
||
42 | ms, y = pipeline.transform_filter(ms, y) |
||
0 ignored issues
–
show
The name
ms does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$ ).
This check looks for invalid names for a range of different identifiers. You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements. If your project includes a Pylint configuration file, the settings contained in that file take precedence. To find out more about Pylint, please refer to their site. ![]() The name
y does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$ ).
This check looks for invalid names for a range of different identifiers. You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements. If your project includes a Pylint configuration file, the settings contained in that file take precedence. To find out more about Pylint, please refer to their site. ![]() |
|||
43 | y.columns.name = 'shifts' |
||
44 | |||
45 | cv = SimThresholdSplit(min_threshold=0.6, block_width=4000, n_jobs=-1).fit(ms) |
||
0 ignored issues
–
show
The name
cv does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$ ).
This check looks for invalid names for a range of different identifiers. You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements. If your project includes a Pylint configuration file, the settings contained in that file take precedence. To find out more about Pylint, please refer to their site. ![]() |
|||
46 | train, valid, test = cv.split((70, 15, 15)) |
||
47 | |||
48 | (ms, y, train, valid, test) = contiguous_order((ms, y, train, valid, test), (train, valid, test)) |
||
0 ignored issues
–
show
The name
ms does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$ ).
This check looks for invalid names for a range of different identifiers. You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements. If your project includes a Pylint configuration file, the settings contained in that file take precedence. To find out more about Pylint, please refer to their site. ![]() The name
y does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$ ).
This check looks for invalid names for a range of different identifiers. You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements. If your project includes a Pylint configuration file, the settings contained in that file take precedence. To find out more about Pylint, please refer to their site. ![]() |
|||
49 | splits = (('train', train), ('valid', valid), ('test', test)) |
||
50 | |||
51 | self.run(ms, y, output_path=output_path, splits=splits) |
||
52 | |||
53 | 1 | @staticmethod |
|
54 | def parse_data(filepath): |
||
55 | |||
56 | """ Reads the raw datafile. """ |
||
57 | |||
58 | LOGGER.info('Reading file: %s', filepath) |
||
59 | data = io.read_sdf(filepath, removeHs=False, warn_bad_mol=False) |
||
60 | data.index = data['nmrshiftdb2 ID'].astype(int) |
||
61 | data.index.name = 'nmrshiftdb2_id' |
||
62 | data.columns = data.columns.to_series().apply(utils.free_to_snail) |
||
63 | data = data.sort_index() |
||
64 | LOGGER.info('Read %s molecules.', len(data)) |
||
65 | return data |
||
66 | |||
67 | 1 | @staticmethod |
|
68 | def get_spectra(data): |
||
69 | |||
70 | """ Retrieves spectra from raw data. """ |
||
71 | |||
72 | LOGGER.info('Retrieving spectra from raw data...') |
||
73 | isotopes = [ |
||
74 | '1h', |
||
75 | '11b', |
||
76 | '13c', |
||
77 | '15n', |
||
78 | '17o', |
||
79 | '19f', |
||
80 | '29si', |
||
81 | '31p', |
||
82 | '33s', |
||
83 | '73ge', |
||
84 | '195pt' |
||
85 | ] |
||
86 | |||
87 | def is_spectrum(col_name, ele='c'): |
||
0 ignored issues
–
show
This function should have a docstring.
The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods: class SomeClass:
def some_method(self):
"""Do x and return foo."""
If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions. ![]() |
|||
88 | return any(isotope in col_name for isotope in isotopes) |
||
89 | |||
90 | spectrum_cols = [c for c in data if is_spectrum(c)] |
||
91 | data = data[spectrum_cols] |
||
92 | |||
93 | def index_pair(s): |
||
0 ignored issues
–
show
The name
s does not conform to the argument naming conventions ([a-z_][a-z0-9_]{2,30}$ ).
This check looks for invalid names for a range of different identifiers. You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements. If your project includes a Pylint configuration file, the settings contained in that file take precedence. To find out more about Pylint, please refer to their site. ![]() This function should have a docstring.
The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods: class SomeClass:
def some_method(self):
"""Do x and return foo."""
If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions. ![]() |
|||
94 | return s[0], int(s[1]) |
||
95 | |||
96 | data.columns = pd.MultiIndex.from_tuples([index_pair(i.split('_')[1:]) for i in data.columns]) |
||
0 ignored issues
–
show
|
|||
97 | return data |
||
98 | |||
99 | 1 | @staticmethod |
|
100 | def process_spectra(data): |
||
101 | |||
102 | """ Turn the string representations found in sdf file into a dictionary. """ |
||
103 | |||
104 | def spectrum_dict(spectrum_string): |
||
0 ignored issues
–
show
This function should have a docstring.
The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods: class SomeClass:
def some_method(self):
"""Do x and return foo."""
If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions. ![]() |
|||
105 | if not isinstance(spectrum_string, str): |
||
106 | return np.nan # no spectra are still nan |
||
107 | if spectrum_string == '': |
||
108 | return np.nan # empty spectra are nan |
||
109 | sigs = spectrum_string.strip().strip('|').strip().split('|') # extract signals |
||
110 | sig_tup = [tuple(s.split(';')) for s in sigs] # take tuples as (signal, coupling, atom) |
||
111 | return {int(s[2]): float(s[0]) for s in sig_tup} # make spectrum a dictionary of atom to signal |
||
0 ignored issues
–
show
|
|||
112 | |||
113 | return data.applymap(spectrum_dict) |
||
114 | |||
115 | 1 | @staticmethod |
|
116 | def combine_duplicates(data): |
||
117 | |||
118 | """ Collect duplicate spectra into one dictionary. All shifts are collected into lists. """ |
||
119 | |||
120 | def aggregate_dicts(ds): |
||
0 ignored issues
–
show
The name
ds does not conform to the argument naming conventions ([a-z_][a-z0-9_]{2,30}$ ).
This check looks for invalid names for a range of different identifiers. You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements. If your project includes a Pylint configuration file, the settings contained in that file take precedence. To find out more about Pylint, please refer to their site. ![]() This function should have a docstring.
The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods: class SomeClass:
def some_method(self):
"""Do x and return foo."""
If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions. ![]() |
|||
121 | res = defaultdict(list) |
||
122 | for d in ds: |
||
0 ignored issues
–
show
The name
d does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$ ).
This check looks for invalid names for a range of different identifiers. You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements. If your project includes a Pylint configuration file, the settings contained in that file take precedence. To find out more about Pylint, please refer to their site. ![]() |
|||
123 | if not isinstance(d, dict): continue |
||
0 ignored issues
–
show
|
|||
124 | for k, v in d.items(): |
||
0 ignored issues
–
show
The name
v does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$ ).
This check looks for invalid names for a range of different identifiers. You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements. If your project includes a Pylint configuration file, the settings contained in that file take precedence. To find out more about Pylint, please refer to their site. ![]() |
|||
125 | res[k].append(v) |
||
126 | return dict(res) if len(res) else np.nan |
||
127 | |||
128 | return data.groupby(level=0, axis=1).apply(lambda s: s.apply(aggregate_dicts, axis=1)) |
||
129 | |||
130 | 1 | @staticmethod |
|
131 | def squash_duplicates(data): |
||
132 | |||
133 | """ Take the mean of all the duplicates. This is where we could do a bit more checking. """ |
||
134 | |||
135 | def squash(d): |
||
0 ignored issues
–
show
The name
d does not conform to the argument naming conventions ([a-z_][a-z0-9_]{2,30}$ ).
This check looks for invalid names for a range of different identifiers. You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements. If your project includes a Pylint configuration file, the settings contained in that file take precedence. To find out more about Pylint, please refer to their site. ![]() This function should have a docstring.
The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods: class SomeClass:
def some_method(self):
"""Do x and return foo."""
If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions. ![]() |
|||
136 | if not isinstance(d, dict): |
||
137 | return np.nan |
||
138 | else: |
||
139 | return {k: np.mean(v) for k, v in d.items()} |
||
140 | |||
141 | return data.applymap(squash) |
||
142 | |||
143 | 1 | @staticmethod |
|
144 | def to_frame(data): |
||
145 | |||
146 | """ Convert a series of dictionaries to a dataframe. """ |
||
147 | res = pd.DataFrame(data.tolist(), index=data.index) |
||
148 | res.columns.name = 'atom_idx' |
||
149 | return res |
||
150 | |||
151 | 1 | @staticmethod |
|
152 | 1 | def extract_duplicates(data, kind='13c'): |
|
153 | |||
154 | """ Get all 13c duplicates. """ |
||
155 | |||
156 | def is_duplicate(ele): |
||
0 ignored issues
–
show
This function should have a docstring.
The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods: class SomeClass:
def some_method(self):
"""Do x and return foo."""
If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions. ![]() |
|||
157 | if not isinstance(ele, dict): |
||
158 | return False |
||
159 | else: |
||
160 | return len(list(ele.values())[0]) > 1 |
||
161 | |||
162 | return data.loc[data[kind].apply(is_duplicate), kind] |
||
163 | |||
164 | 1 | @staticmethod |
|
165 | def log_dists(data): |
||
0 ignored issues
–
show
This method should have a docstring.
The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods: class SomeClass:
def some_method(self):
"""Do x and return foo."""
If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions. ![]() |
|||
166 | |||
167 | def n_spect(ele): |
||
0 ignored issues
–
show
This function should have a docstring.
The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods: class SomeClass:
def some_method(self):
"""Do x and return foo."""
If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions. ![]() |
|||
168 | return isinstance(ele, dict) |
||
169 | |||
170 | def n_shifts(ele): |
||
0 ignored issues
–
show
This function should have a docstring.
The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods: class SomeClass:
def some_method(self):
"""Do x and return foo."""
If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions. ![]() |
|||
171 | return len(ele) if isinstance(ele, dict) else 0 |
||
172 | |||
173 | def log_message(func): |
||
0 ignored issues
–
show
This function should have a docstring.
The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods: class SomeClass:
def some_method(self):
"""Do x and return foo."""
If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions. ![]() |
|||
174 | return ' '.join('{k}: {v}'.format(k=k, v=v) for k, v in data.applymap(func).sum().to_dict().items()) |
||
0 ignored issues
–
show
|
|||
175 | |||
176 | LOGGER.info('Number of spectra: %s', log_message(n_spect)) |
||
177 | LOGGER.info('Extracted shifts: %s', log_message(n_shifts)) |
||
178 | |||
179 | |||
180 | 1 | def log_duplicates(self, data): |
|
0 ignored issues
–
show
This method should have a docstring.
The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods: class SomeClass:
def some_method(self):
"""Do x and return foo."""
If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions. ![]() |
|||
181 | |||
182 | for kind in '1h', '13c': |
||
183 | dups = self.extract_duplicates(data, kind) |
||
184 | LOGGER.info('Number of duplicate %s spectra: %s', kind, len(dups)) |
||
185 | res = pd.DataFrame(sum((list(itertools.combinations(l, 2)) for s in dups for k, l in s.items()), [])) |
||
0 ignored issues
–
show
|
|||
186 | LOGGER.info('Number of duplicate %s pairs: %f', kind, len(res)) |
||
187 | LOGGER.info('MAE for duplicate %s: %.4f', kind, metrics.mean_absolute_error(res[0], res[1])) |
||
0 ignored issues
–
show
|
|||
188 | LOGGER.info('MSE for duplicate %s: %.4f', kind, metrics.mean_squared_error(res[0], res[1])) |
||
0 ignored issues
–
show
|
|||
189 | LOGGER.info('r2 for duplicate %s: %.4f', kind, metrics.r2_score(res[0], res[1])) |
||
190 | |||
191 | |||
192 | 1 | if __name__ == '__main__': |
|
193 | logging.basicConfig(level=logging.DEBUG) |
||
194 | LOGGER.info('Converting NMRShiftDB2 Dataset...') |
||
195 | NMRShiftDB2Converter.convert() |
||
0 ignored issues
–
show
|
The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:
If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.