1 | #! /usr/bin/env python |
||
0 ignored issues
–
show
|
|||
2 | # |
||
3 | # Copyright (C) 2016 Rich Lewis <[email protected]> |
||
4 | # License: 3-clause BSD |
||
5 | |||
6 | 1 | import warnings |
|
7 | 1 | import tempfile |
|
8 | 1 | import os |
|
9 | |||
10 | 1 | import pandas as pd |
|
0 ignored issues
–
show
The import
pandas could not be resolved.
This can be caused by one of the following: 1. Missing DependenciesThis error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands. # .scrutinizer.yml
before_commands:
- sudo pip install abc # Python2
- sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use
the command for the correct version.
2. Missing __init__.py filesThis error could also result from missing ![]() |
|||
11 | 1 | import h5py |
|
0 ignored issues
–
show
The import
h5py could not be resolved.
This can be caused by one of the following: 1. Missing DependenciesThis error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands. # .scrutinizer.yml
before_commands:
- sudo pip install abc # Python2
- sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use
the command for the correct version.
2. Missing __init__.py filesThis error could also result from missing ![]() |
|||
12 | 1 | from fuel.datasets import H5PYDataset |
|
0 ignored issues
–
show
The import
fuel.datasets could not be resolved.
This can be caused by one of the following: 1. Missing DependenciesThis error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands. # .scrutinizer.yml
before_commands:
- sudo pip install abc # Python2
- sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use
the command for the correct version.
2. Missing __init__.py filesThis error could also result from missing ![]() |
|||
13 | 1 | from fuel.utils import find_in_data_path |
|
0 ignored issues
–
show
The import
fuel.utils could not be resolved.
This can be caused by one of the following: 1. Missing DependenciesThis error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands. # .scrutinizer.yml
before_commands:
- sudo pip install abc # Python2
- sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use
the command for the correct version.
2. Missing __init__.py filesThis error could also result from missing ![]() |
|||
14 | 1 | from fuel import config |
|
0 ignored issues
–
show
The import
fuel could not be resolved.
This can be caused by one of the following: 1. Missing DependenciesThis error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands. # .scrutinizer.yml
before_commands:
- sudo pip install abc # Python2
- sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use
the command for the correct version.
2. Missing __init__.py filesThis error could also result from missing ![]() |
|||
15 | |||
16 | |||
17 | 1 | class Dataset(H5PYDataset): |
|
18 | |||
19 | """ Abstract base class providing an interface to the skchem data format.""" |
||
20 | |||
21 | 1 | def __init__(self, **kwargs): |
|
0 ignored issues
–
show
|
|||
22 | kwargs.setdefault('load_in_memory', True) |
||
23 | super(Dataset, self).__init__( |
||
24 | file_or_path=find_in_data_path(self.filename), **kwargs) |
||
0 ignored issues
–
show
|
|||
25 | |||
26 | 1 | @classmethod |
|
27 | def available_sources(cls): |
||
0 ignored issues
–
show
This method should have a docstring.
The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods: class SomeClass:
def some_method(self):
"""Do x and return foo."""
If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions. ![]() |
|||
28 | with h5py.File(find_in_data_path(cls.filename)) as f: |
||
0 ignored issues
–
show
The name
f does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$ ).
This check looks for invalid names for a range of different identifiers. You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements. If your project includes a Pylint configuration file, the settings contained in that file take precedence. To find out more about Pylint, please refer to their site. ![]() |
|||
29 | return cls.get_all_sources(f) |
||
0 ignored issues
–
show
|
|||
30 | |||
31 | 1 | @classmethod |
|
32 | def available_sets(cls): |
||
0 ignored issues
–
show
This method should have a docstring.
The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods: class SomeClass:
def some_method(self):
"""Do x and return foo."""
If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions. ![]() |
|||
33 | with h5py.File(find_in_data_path(cls.filename)) as f: |
||
0 ignored issues
–
show
The name
f does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$ ).
This check looks for invalid names for a range of different identifiers. You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements. If your project includes a Pylint configuration file, the settings contained in that file take precedence. To find out more about Pylint, please refer to their site. ![]() |
|||
34 | return cls.get_all_splits(f) |
||
0 ignored issues
–
show
|
|||
35 | |||
36 | 1 | @classmethod |
|
37 | 1 | def load_set(cls, set_name, sources=()): |
|
38 | |||
39 | """ Load the sources for a single set. |
||
40 | |||
41 | Args: |
||
42 | set_name (str): |
||
43 | The set name. |
||
44 | sources (tuple[str]): |
||
45 | The sources to return data for. |
||
46 | |||
47 | Returns: |
||
48 | tuple[np.array] |
||
49 | The requested sources for the requested set. |
||
50 | """ |
||
51 | if set_name == 'all': |
||
52 | set_name = cls.set_names |
||
0 ignored issues
–
show
|
|||
53 | else: |
||
54 | set_name = (set_name,) |
||
55 | if sources == 'all': |
||
56 | sources = cls.sources_names |
||
0 ignored issues
–
show
|
|||
57 | return cls(which_sets=set_name, sources=sources, load_in_memory=True).data_sources |
||
0 ignored issues
–
show
|
|||
58 | |||
59 | 1 | @classmethod |
|
60 | 1 | def load_data(cls, sets=(), sources=()): |
|
61 | |||
62 | """ Load a set of sources. |
||
63 | |||
64 | Args: |
||
65 | sets (tuple[str]): |
||
66 | The sets to return data for. |
||
67 | sources: |
||
68 | The sources to return data for. |
||
69 | |||
70 | Example: |
||
71 | (X_train, y_train), (X_test, y_test) = Dataset.load_data(sets=('train', 'test'), sources=('X', 'y')) |
||
0 ignored issues
–
show
|
|||
72 | """ |
||
73 | |||
74 | for set_name in sets: |
||
75 | yield cls.load_set(set_name, sources) |
||
76 | |||
77 | 1 | @classmethod |
|
78 | def read_frame(cls, key, *args, **kwargs): |
||
79 | |||
80 | """ Load a set of features from the dataset as a pandas object. |
||
81 | |||
82 | Args: |
||
83 | key (str): |
||
84 | The HDF5 key for required data. Typically, this will be one of |
||
85 | |||
86 | - structure: for the raw molecules |
||
87 | - smiles: for the smiles |
||
88 | - features/{feat_name}: for the features |
||
89 | - targets/{targ_name}: for the targets |
||
90 | |||
91 | Returns: |
||
92 | pd.Series or pd.DataFrame or pd.Panel |
||
93 | The data as a dataframe. |
||
94 | """ |
||
95 | |||
96 | 1 | with warnings.catch_warnings(): |
|
97 | 1 | warnings.simplefilter('ignore') |
|
98 | 1 | data = pd.read_hdf(find_in_data_path(cls.filename), key, *args, **kwargs) |
|
0 ignored issues
–
show
|
|||
99 | 1 | if isinstance(data, pd.Panel): |
|
100 | data = data.transpose(2, 1, 0) |
||
101 | 1 | return data |
|
102 | |||
103 | 1 | @classmethod |
|
104 | 1 | def download(cls, output_directory=None, download_directory=None): |
|
105 | |||
106 | """ Download the dataset and convert it. |
||
107 | |||
108 | Args: |
||
109 | output_directory (str): |
||
110 | The directory to save the data to. Defaults to the first |
||
111 | directory in the fuel data path. |
||
112 | |||
113 | download_directory (str): |
||
114 | The directory to save the raw files to. Defaults to a temporary |
||
115 | directory. |
||
116 | |||
117 | Returns: |
||
118 | str: |
||
119 | The path of the downloaded and processed dataset. |
||
120 | """ |
||
121 | |||
122 | if not output_directory: |
||
123 | output_directory = config.config['data_path']['yaml'].split(':')[0] |
||
124 | |||
125 | output_directory = os.path.expanduser(output_directory) |
||
126 | |||
127 | if not download_directory: |
||
128 | download_directory = tempfile.mkdtemp() |
||
129 | |||
130 | cls.downloader.download(directory=download_directory) |
||
0 ignored issues
–
show
|
|||
131 | return cls.converter.convert(directory=download_directory, |
||
0 ignored issues
–
show
|
|||
132 | output_directory=output_directory) |
||
133 |
The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:
If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.