Completed
Push — master ( 4d243d...e4a84f )
by Rich
01:28
created

Dataset.read_frame()   B

Complexity

Conditions 3

Size

Total Lines 25

Duplication

Lines 0
Ratio 0 %

Importance

Changes 1
Bugs 0 Features 1
Metric Value
c 1
b 0
f 1
dl 0
loc 25
rs 8.8571
cc 3
1
#! /usr/bin/env python
0 ignored issues
show
Coding Style introduced by
This module should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
2
#
3
# Copyright (C) 2016 Rich Lewis <[email protected]>
4
# License: 3-clause BSD
5
6
import warnings
7
import tempfile
8
import os
9
10
import pandas as pd
0 ignored issues
show
Configuration introduced by
The import pandas could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
11
12
from fuel.datasets import H5PYDataset
0 ignored issues
show
Configuration introduced by
The import fuel.datasets could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
13
from fuel.utils import find_in_data_path
0 ignored issues
show
Configuration introduced by
The import fuel.utils could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
14
from fuel import config
0 ignored issues
show
Configuration introduced by
The import fuel could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
15
16
class Dataset(H5PYDataset):
17
18
    """ Abstract base class providing an interface to the skchem data format."""
19
20
    def __init__(self, **kwargs):
0 ignored issues
show
introduced by
Use of super on an old style class
Loading history...
21
        kwargs.setdefault('load_in_memory', True)
22
        super(Dataset, self).__init__(
23
            file_or_path=find_in_data_path(self.filename), **kwargs)
0 ignored issues
show
Bug introduced by
The Instance of Dataset does not seem to have a member named filename.

This check looks for calls to members that are non-existent. These calls will fail.

The member could have been renamed or removed.

Loading history...
24
25
    @classmethod
26
    def load_set(cls, set_name, sources=()):
27
28
        """ Load the sources for a single set.
29
30
        Args:
31
            set_name (str):
32
                The set name.
33
            sources (tuple[str]):
34
                The sources to return data for.
35
36
        Returns:
37
            tuple[np.array]
38
                The requested sources for the requested set.
39
        """
40
        return cls(which_sets=(set_name,), sources=sources, load_in_memory=True).data_sources
0 ignored issues
show
Bug introduced by
The Instance of Dataset does not seem to have a member named data_sources.

This check looks for calls to members that are non-existent. These calls will fail.

The member could have been renamed or removed.

Loading history...
41
42
    @classmethod
43
    def load_data(cls, sets=(), sources=()):
44
45
        """ Load a set of sources.
46
47
        Args:
48
            sets (tuple[str]):
49
                The sets to return data for.
50
            sources:
51
                The sources to return data for.
52
53
        Example:
54
            (X_train, y_train), (X_test, y_test) = Dataset.load_data(sets=('train', 'test'), sources=('X', 'y'))
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (112/100).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
55
        """
56
57
        for set_name in sets:
58
            yield cls.load_set(set_name, sources)
59
60
    @classmethod
61
    def read_frame(cls, key, *args, **kwargs):
62
63
        """ Load a set of features from the dataset as a pandas object.
64
65
        Args:
66
            key (str):
67
                The HDF5 key for required data.  Typically, this will be one of
68
69
                - structure: for the raw molecules
70
                - smiles: for the smiles
71
                - features/{feat_name}: for the features
72
                - targets/{targ_name}: for the targets
73
74
        Returns:
75
            pd.Series or pd.DataFrame or pd.Panel
76
                The data as a dataframe.
77
        """
78
79
        with warnings.catch_warnings():
80
            warnings.simplefilter('ignore')
81
            data = pd.read_hdf(cls.filename, key, *args, **kwargs)
0 ignored issues
show
Bug introduced by
The Class Dataset does not seem to have a member named filename.

This check looks for calls to members that are non-existent. These calls will fail.

The member could have been renamed or removed.

Loading history...
82
        if isinstance(data, pd.Panel):
83
            data = data.transpose(2, 1, 0)
84
        return data
85
86
    @classmethod
87
    def download(cls, output_directory=None, download_directory=None):
88
89
        """ Download the dataset and convert it.
90
91
        Args:
92
            output_directory (str):
93
                The directory to save the data to. Defaults to the first
94
                directory in the fuel data path.
95
96
            download_directory (str):
97
                The directory to save the raw files to. Defaults to a temporary
98
                directory.
99
100
        Returns:
101
            str:
102
                The path of the downloaded and processed dataset.
103
        """
104
105
        if not output_directory:
106
            output_directory = config.config['data_path']['yaml'].split(':')[0]
107
108
        output_directory = os.path.expanduser(output_directory)
109
110
        if not download_directory:
111
            download_directory = tempfile.mkdtemp()
112
113
        cls.downloader.download(directory=download_directory)
0 ignored issues
show
Bug introduced by
The Class Dataset does not seem to have a member named downloader.

This check looks for calls to members that are non-existent. These calls will fail.

The member could have been renamed or removed.

Loading history...
114
        return cls.converter.convert(directory=download_directory,
0 ignored issues
show
Bug introduced by
The Class Dataset does not seem to have a member named converter.

This check looks for calls to members that are non-existent. These calls will fail.

The member could have been renamed or removed.

Loading history...
115
                                     output_directory=output_directory)
116