Issues (942)

skchem/data/datasets/base.py (23 issues)

1
#! /usr/bin/env python
0 ignored issues
show
This module should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
2
#
3
# Copyright (C) 2016 Rich Lewis <[email protected]>
4
# License: 3-clause BSD
5
6 1
import warnings
7 1
import tempfile
8 1
import os
9
10 1
import pandas as pd
0 ignored issues
show
The import pandas could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
11 1
import h5py
0 ignored issues
show
The import h5py could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
12 1
from fuel.datasets import H5PYDataset
0 ignored issues
show
The import fuel.datasets could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
13 1
from fuel.utils import find_in_data_path
0 ignored issues
show
The import fuel.utils could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
14 1
from fuel import config
0 ignored issues
show
The import fuel could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
15
16
17 1
class Dataset(H5PYDataset):
18
19
    """ Abstract base class providing an interface to the skchem data format."""
20
21 1
    def __init__(self, **kwargs):
0 ignored issues
show
Use of super on an old style class
Loading history...
22
        kwargs.setdefault('load_in_memory', True)
23
        super(Dataset, self).__init__(
24
            file_or_path=find_in_data_path(self.filename), **kwargs)
0 ignored issues
show
The Instance of Dataset does not seem to have a member named filename.

This check looks for calls to members that are non-existent. These calls will fail.

The member could have been renamed or removed.

Loading history...
25
26 1
    @classmethod
27
    def available_sources(cls):
0 ignored issues
show
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
28
        with h5py.File(find_in_data_path(cls.filename)) as f:
0 ignored issues
show
The Class Dataset does not seem to have a member named filename.

This check looks for calls to members that are non-existent. These calls will fail.

The member could have been renamed or removed.

Loading history...
Coding Style Naming introduced by
The name f does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
29
            return cls.get_all_sources(f)
0 ignored issues
show
The Class Dataset does not seem to have a member named get_all_sources.

This check looks for calls to members that are non-existent. These calls will fail.

The member could have been renamed or removed.

Loading history...
30
31 1
    @classmethod
32
    def available_sets(cls):
0 ignored issues
show
This method should have a docstring.

The coding style of this project requires that you add a docstring to this code element. Below, you find an example for methods:

class SomeClass:
    def some_method(self):
        """Do x and return foo."""

If you would like to know more about docstrings, we recommend to read PEP-257: Docstring Conventions.

Loading history...
33
        with h5py.File(find_in_data_path(cls.filename)) as f:
0 ignored issues
show
The Class Dataset does not seem to have a member named filename.

This check looks for calls to members that are non-existent. These calls will fail.

The member could have been renamed or removed.

Loading history...
Coding Style Naming introduced by
The name f does not conform to the variable naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
34
            return cls.get_all_splits(f)
0 ignored issues
show
The Class Dataset does not seem to have a member named get_all_splits.

This check looks for calls to members that are non-existent. These calls will fail.

The member could have been renamed or removed.

Loading history...
35
36 1
    @classmethod
37 1
    def load_set(cls, set_name, sources=()):
38
39
        """ Load the sources for a single set.
40
41
        Args:
42
            set_name (str):
43
                The set name.
44
            sources (tuple[str]):
45
                The sources to return data for.
46
47
        Returns:
48
            tuple[np.array]
49
                The requested sources for the requested set.
50
        """
51
        if set_name == 'all':
52
            set_name = cls.set_names
0 ignored issues
show
The Class Dataset does not seem to have a member named set_names.

This check looks for calls to members that are non-existent. These calls will fail.

The member could have been renamed or removed.

Loading history...
53
        else:
54
            set_name = (set_name,)
55
        if sources == 'all':
56
            sources = cls.sources_names
0 ignored issues
show
The Class Dataset does not seem to have a member named sources_names.

This check looks for calls to members that are non-existent. These calls will fail.

The member could have been renamed or removed.

Loading history...
57
        return cls(which_sets=set_name, sources=sources, load_in_memory=True).data_sources
0 ignored issues
show
The Instance of Dataset does not seem to have a member named data_sources.

This check looks for calls to members that are non-existent. These calls will fail.

The member could have been renamed or removed.

Loading history...
58
59 1
    @classmethod
60 1
    def load_data(cls, sets=(), sources=()):
61
62
        """ Load a set of sources.
63
64
        Args:
65
            sets (tuple[str]):
66
                The sets to return data for.
67
            sources:
68
                The sources to return data for.
69
70
        Example:
71
            (X_train, y_train), (X_test, y_test) = Dataset.load_data(sets=('train', 'test'), sources=('X', 'y'))
0 ignored issues
show
This line is too long as per the coding-style (112/100).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
72
        """
73
74
        for set_name in sets:
75
            yield cls.load_set(set_name, sources)
76
77 1
    @classmethod
78
    def read_frame(cls, key, *args, **kwargs):
79
80
        """ Load a set of features from the dataset as a pandas object.
81
82
        Args:
83
            key (str):
84
                The HDF5 key for required data.  Typically, this will be one of
85
86
                - structure: for the raw molecules
87
                - smiles: for the smiles
88
                - features/{feat_name}: for the features
89
                - targets/{targ_name}: for the targets
90
91
        Returns:
92
            pd.Series or pd.DataFrame or pd.Panel
93
                The data as a dataframe.
94
        """
95
96 1
        with warnings.catch_warnings():
97 1
            warnings.simplefilter('ignore')
98 1
            data = pd.read_hdf(find_in_data_path(cls.filename), key, *args, **kwargs)
0 ignored issues
show
The Class Dataset does not seem to have a member named filename.

This check looks for calls to members that are non-existent. These calls will fail.

The member could have been renamed or removed.

Loading history...
99 1
        if isinstance(data, pd.Panel):
100
            data = data.transpose(2, 1, 0)
101 1
        return data
102
103 1
    @classmethod
104 1
    def download(cls, output_directory=None, download_directory=None):
105
106
        """ Download the dataset and convert it.
107
108
        Args:
109
            output_directory (str):
110
                The directory to save the data to. Defaults to the first
111
                directory in the fuel data path.
112
113
            download_directory (str):
114
                The directory to save the raw files to. Defaults to a temporary
115
                directory.
116
117
        Returns:
118
            str:
119
                The path of the downloaded and processed dataset.
120
        """
121
122
        if not output_directory:
123
            output_directory = config.config['data_path']['yaml'].split(':')[0]
124
125
        output_directory = os.path.expanduser(output_directory)
126
127
        if not download_directory:
128
            download_directory = tempfile.mkdtemp()
129
130
        cls.downloader.download(directory=download_directory)
0 ignored issues
show
The Class Dataset does not seem to have a member named downloader.

This check looks for calls to members that are non-existent. These calls will fail.

The member could have been renamed or removed.

Loading history...
131
        return cls.converter.convert(directory=download_directory,
0 ignored issues
show
The Class Dataset does not seem to have a member named converter.

This check looks for calls to members that are non-existent. These calls will fail.

The member could have been renamed or removed.

Loading history...
132
                                     output_directory=output_directory)
133