Completed
Push — master ( 6e14ce...512eb0 )
by Rich
02:12
created

Split   A

Complexity

Total Complexity 12

Size/Duplication

Total Lines 37
Duplicated Lines 0 %

Importance

Changes 0
Metric Value
wmc 12
c 0
b 0
f 0
dl 0
loc 37
rs 10

6 Methods

Rating   Name   Duplication   Size   Complexity  
A ref() 0 3 1
A indices() 0 3 1
A __init__() 0 4 1
A to_dict() 0 7 4
A contiguous() 0 8 3
A save() 0 5 2
1
#! /usr/bin/env python
0 ignored issues
show
Coding Style introduced by
This module should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
2
#
3
# Copyright (C) 2016 Rich Lewis <[email protected]>
4
# License: 3-clause BSD
5
6
import warnings
7
import logging
8
import os
9
from collections import namedtuple
10
11
import numpy as np
0 ignored issues
show
Configuration introduced by
The import numpy could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
12
import pandas as pd
0 ignored issues
show
Configuration introduced by
The import pandas could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
13
import h5py
0 ignored issues
show
Configuration introduced by
The import h5py could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
14
from fuel.datasets import H5PYDataset
0 ignored issues
show
Configuration introduced by
The import fuel.datasets could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
15
16
from ... import forcefields
17
from ... import filters
18
from ... import descriptors
19
from ... import standardizers
20
from ... import pipeline
21
22
logger = logging.getLogger(__name__)
0 ignored issues
show
Coding Style Naming introduced by
The name logger does not conform to the constant naming conventions ((([A-Z_][A-Z0-9_]*)|(__.*__))$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
23
24
25
def default_pipeline():
26
    """ Return a default pipeline to be used for general datasets. """
27
    return pipeline.Pipeline([
28
        standardizers.ChemAxonStandardizer(keep_failed=True, warn_on_fail=False),
29
        forcefields.UFF(add_hs=True, warn_on_fail=False),
30
        filters.OrganicFilter(),
31
        filters.AtomNumberFilter(above=5, below=100, include_hydrogens=True),
32
        filters.MassFilter(below=1000)
33
    ])
34
35
DEFAULT_PYTABLES_KW = {
36
    'complib': 'bzip2',
37
    'complevel': 9
38
}
39
40
def contiguous_order(to_order, splits):
41
    """ Determine a contiguous order from non-overlapping splits, and put data in that order.
42
43
    Args:
44
        to_order (iterable<pd.Series, pd.DataFrame, pd.Panel>):
45
            The pandas objects to put in contiguous order.
46
        splits (iterable<pd.Series>):
47
            The non-overlapping splits, as boolean masks.
48
49
    Returns:
50
        iterable<pd.Series, pd.DataFrame, pd.Panel>: The data in contiguous order.
51
    """
52
53
    member = pd.Series(0, index=splits[0].index)
54
    for i, split in enumerate(splits):
55
        member[split] = i
56
    idx = member.sort_values().index
57
    return (order.reindex(idx) for order in to_order)
58
59
Feature = namedtuple('Feature', ['fper', 'key', 'axis_names'])
60
61
62
def default_features():
0 ignored issues
show
Coding Style introduced by
This function should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
63
    return (
64
        Feature(fper=descriptors.MorganFeaturizer(),
65
                key='X_morg',
66
                axis_names=['batch', 'features']),
67
        Feature(fper=descriptors.PhysicochemicalFeaturizer(),
68
                key='X_pc',
69
                axis_names=['batch', 'features']),
70
        Feature(fper=descriptors.AtomFeaturizer(max_atoms=100),
71
                key='A',
72
                axis_names=['batch', 'atom_idx', 'features']),
73
        Feature(fper=descriptors.GraphDistanceTransformer(max_atoms=100),
74
                key='G',
75
                axis_names=['batch', 'atom_idx', 'atom_idx']),
76
        Feature(fper=descriptors.SpacialDistanceTransformer(max_atoms=100),
77
                key='G_d',
78
                axis_names=['batch', 'atom_idx', 'atom_idx']),
79
        Feature(fper=descriptors.ChemAxonFeaturizer(features='all'),
80
                key='X_cx',
81
                axis_names=['batch', 'features']),
82
        Feature(fper=descriptors.ChemAxonAtomFeaturizer(features='all', max_atoms=100),
83
                key='A_cx',
84
                axis_names=['batch', 'atom_idx', 'features'])
85
    )
86
87
88
class Split(object):
0 ignored issues
show
Coding Style introduced by
This class should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
89
90
    def __init__(self, mask, name, converter):
91
        self.mask = mask
92
        self.name = name
93
        self.converter = converter
94
95
    @property
96
    def contiguous(self):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
97
        diff = np.ediff1d(self.mask)
98
        if self.mask.iloc[0] != 0:
99
            diff[0] = 1
100
        if self.mask.iloc[-1] != 0:
101
            diff[-1] = -1
102
        return sum(diff == -1) == 1 or sum(diff == 1) == 1
103
104
    @property
105
    def indices(self):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
106
        return np.nonzero(self.mask)[0]
107
108
    def save(self):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
109
        self.converter.data_file[self.name + '_indices'] = self.indices
110
        with warnings.catch_warnings():
111
            warnings.simplefilter('ignore')
112
            self.mask.to_hdf(self.converter.data_file.filename, '/indices/' + self.name)
113
114
    @property
115
    def ref(self):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
116
        return self.converter.data_file[self.name + '_indices'].ref
117
118
    def to_dict(self):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
119
        idx = self.indices
120
        if self.contiguous:
121
            low, high = min(idx), max(idx)
122
            return {source: (low, high) for source in self.converter.source_names}
123
        else:
124
            return {source: (-1, -1, self.ref) for source in self.converter.source_names}
125
126
127
class Converter(object):
128
    """ Create a fuel dataset from molecules and targets. """
129
130
    def __init__(self, directory, output_directory, output_filename='default.h5'):
0 ignored issues
show
Unused Code introduced by
The argument directory seems to be unused.
Loading history...
Unused Code introduced by
The argument output_filename seems to be unused.
Loading history...
Unused Code introduced by
The argument output_directory seems to be unused.
Loading history...
131
        raise NotImplemented
0 ignored issues
show
Best Practice introduced by
NotImplemented raised - should raise NotImplementedError
Loading history...
132
133
    def run(self, ms, y, output_path, splits=None, features=None, pytables_kws=DEFAULT_PYTABLES_KW):
0 ignored issues
show
Bug Best Practice introduced by
The default value DEFAULT_PYTABLES_KW (__builtin__.dict) might cause unintended side-effects.

Objects as default values are only created once in Python and not on each invocation of the function. If the default object is modified, this modification is carried over to the next invocation of the method.

# Bad:
# If array_param is modified inside the function, the next invocation will
# receive the modified object.
def some_function(array_param=[]):
    # ...

# Better: Create an array on each invocation
def some_function(array_param=None):
    array_param = array_param or []
    # ...
Loading history...
Coding Style Naming introduced by
The name ms does not conform to the argument naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
Coding Style Naming introduced by
The name y does not conform to the argument naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
best-practice introduced by
Too many arguments (7/5)
Loading history...
134
135
        """
136
           Args:
137
        ms (pd.Series):
138
            The molecules of the dataset.
139
        ys (pd.Series or pd.DataFrame):
140
            The target labels of the dataset.
141
        output_path (str):
142
            The path to which the dataset should be saved.
143
        features (list[Feature]):
144
            The features to calculate. Defaults are used if `None`.
145
        splits (iterable<(name, split)>):
146
            An iterable of name, split tuples. Splits are provided as boolean arrays of the whole data.
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (103/100).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
147
        """
148
149
        self.pytables_kws = pytables_kws
0 ignored issues
show
Coding Style introduced by
The attribute pytables_kws was defined outside __init__.

It is generally a good practice to initialize all attributes to default values in the __init__ method:

class Foo:
    def __init__(self, x=None):
        self.x = x
Loading history...
150
        self.output_path = output_path
0 ignored issues
show
Coding Style introduced by
The attribute output_path was defined outside __init__.

It is generally a good practice to initialize all attributes to default values in the __init__ method:

class Foo:
    def __init__(self, x=None):
        self.x = x
Loading history...
151
        self.features = features if features is not None else default_features()
0 ignored issues
show
Coding Style introduced by
The attribute features was defined outside __init__.

It is generally a good practice to initialize all attributes to default values in the __init__ method:

class Foo:
    def __init__(self, x=None):
        self.x = x
Loading history...
152
        self.feature_names = [feat.key for feat in self.features]
0 ignored issues
show
Coding Style introduced by
The attribute feature_names was defined outside __init__.

It is generally a good practice to initialize all attributes to default values in the __init__ method:

class Foo:
    def __init__(self, x=None):
        self.x = x
Loading history...
153
        self.task_names = ['y']
0 ignored issues
show
Coding Style introduced by
The attribute task_names was defined outside __init__.

It is generally a good practice to initialize all attributes to default values in the __init__ method:

class Foo:
    def __init__(self, x=None):
        self.x = x
Loading history...
154
        self.splits = [Split(split, name, self) for name, split in splits]
0 ignored issues
show
Coding Style introduced by
The attribute splits was defined outside __init__.

It is generally a good practice to initialize all attributes to default values in the __init__ method:

class Foo:
    def __init__(self, x=None):
        self.x = x
Loading history...
155
156
        self.create_file(output_path)
157
158
        self.save_splits()
159
        self.save_molecules(ms)
160
        self.save_targets(y)
161
        self.save_features(ms)
162
163
    @property
164
    def source_names(self):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
165
        return self.feature_names + self.task_names
166
167
    @property
168
    def split_names(self):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
169
        return self.splits
170
171
    def create_file(self, path):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
172
        logger.info('Creating h5 file at %s...', self.output_path)
173
        self.data_file = h5py.File(path, 'w')
0 ignored issues
show
Coding Style introduced by
The attribute data_file was defined outside __init__.

It is generally a good practice to initialize all attributes to default values in the __init__ method:

class Foo:
    def __init__(self, x=None):
        self.x = x
Loading history...
174
        return self.data_file
175
176
    def save_molecules(self, mols):
177
178
        """ Save the molecules to the data file. """
179
180
        logger.info('Writing molecules to file...')
181
        logger.debug('Writing %s molecules to %s', len(mols), self.data_file.filename)
182
        with warnings.catch_warnings():
183
            warnings.simplefilter('ignore')
184
            mols.to_hdf(self.data_file.filename, 'structure', **self.pytables_kws)
185
            mols.apply(lambda m: m.to_smiles().encode('utf-8')).to_hdf(self.data_file.filename, 'smiles')
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (105/100).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
186
187
    def save_frame(self, data, name, prefix='targets'):
188
189
        """ Save the a frame to the data file. """
190
191
        logger.info('Writing %s', name)
192
        logger.debug('Writing data of shape %s to %s', data.shape, self.data_file.filename)
193
194
        with warnings.catch_warnings():
195
            warnings.simplefilter('ignore')
196
            if len(data.shape) > 2:
197
                data = data.transpose(2, 1, 0)  # panel serializes backwards for some reason...
198
            data.to_hdf(self.data_file.filename,
199
                        key='/{prefix}/{name}'.format(prefix=prefix, name=name),
200
                        **self.pytables_kws)
201
202
        if isinstance(data, pd.Series):
203
            self.data_file[name] = h5py.SoftLink('/{prefix}/{name}/values'.format(prefix=prefix, name=name))
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (108/100).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
204
            self.data_file[name].dims[0].label = data.index.name
205
206
        elif isinstance(data, pd.DataFrame):
207
            self.data_file[name] = h5py.SoftLink('/{prefix}/{name}/block0_values'.format(prefix=prefix, name=name))
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (115/100).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
208
            self.data_file[name].dims[0].label = data.index.name
209
            self.data_file[name].dims[1].label = data.columns.name
210
211
        elif isinstance(data, pd.Panel):
212
            self.data_file[name] = h5py.SoftLink('/{prefix}/{name}/block0_values'.format(prefix=prefix, name=name))
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (115/100).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
213
            self.data_file[name].dims[0].label = data.minor_axis.name # as panel serializes backwards
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (101/100).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
214
            self.data_file[name].dims[1].label = data.major_axis.name
215
            self.data_file[name].dims[2].label = data.items.name
216
217
    def save_targets(self, y):
0 ignored issues
show
Coding Style Naming introduced by
The name y does not conform to the argument naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
218
219
        self.save_frame(y, name='y', prefix='targets')
220
221
    def save_features(self, ms):
0 ignored issues
show
Coding Style Naming introduced by
The name ms does not conform to the argument naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
222
223
        """ Save all features for the dataset. """
224
        logger.debug('Saving features')
225
        for feat in self.features:
226
            self._save_feature(ms, feat)
227
228
    def _save_feature(self, ms, feat):
0 ignored issues
show
Coding Style Naming introduced by
The name ms does not conform to the argument naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
229
230
        """ Calculate and save a feature to the data file. """
231
        logger.info('Calculating %s', feat.key)
232
233
        fps = feat.fper.transform(ms)
234
        self.save_frame(fps, name=feat.key, prefix='feats')
235
236
    def save_splits(self):
237
238
        """ Save the splits to the data file. """
239
240
        logger.info('Producing dataset splits...')
241
        for split in self.splits:
242
            split.save()
243
        split_dict = {split.name: split.to_dict() for split in self.splits}
244
        splits = H5PYDataset.create_split_array(split_dict)
245
        logger.debug('split: %s', splits)
246
        logger.info('Saving splits...')
247
        with warnings.catch_warnings():
248
            warnings.simplefilter('ignore')
249
            self.data_file.attrs['split'] = splits
250
251
    @classmethod
252
    def convert(cls, **kwargs):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
253
        kwargs.setdefault('directory', os.getcwd())
254
        kwargs.setdefault('output_directory', os.getcwd())
255
256
        return cls(**kwargs).output_path,
257
258
    @classmethod
259
    def fill_subparser(cls, subparser):
0 ignored issues
show
Coding Style introduced by
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
Unused Code introduced by
The argument subparser seems to be unused.
Loading history...
260
        return cls.convert
261