apoor.data - Code Metrics - a-poor/apoor - Measure and Improve Code Quality continuously with Scrutinizer

apoor.data A
last analyzed 2021-01-18 17:36 UTC

↳ Parent: Project

Complexity

Total Complexity

Size/Duplication

Total Lines	78
Duplicated Lines	0 %

Importance

Changes

Metric	Value
wmc	4
eloc	25
dl	0
loc	78
rs	10
c	0
b	0
f	0

4 Functions

Rating	Name	Size	Complexity
A	_decompress()	16	1
A	list_datasets()	11	1
A	load_iris()	15	1
A	load_boston()	16	1

"""Dataset functions.

Includes functions for loading common datasets as pandas DataFrames.
"""

import io
import gzip
import pkgutil

import pandas as pd

_datasets = [
    "iris",
    "boston"
]

def _decompress(bstr: bytes):
    """CSV gzip decompression helper fn.

    Helper function for decompressing
    a gzip-ed CSV dataset and converting
    it to a pandas DataFrame.

    Args:
        bstr: 

            Binary string of a CSV with gzip compression. 

    Returns:
        Pandas DataFrame from compressed dataset.
    """
    decomp = gzip.decompress(bstr).decode()
    f = io.StringIO(decomp)

    return pd.read_csv(f,encoding="utf-8")


def list_datasets():
    """Get available datasets.
    

    Each dataset in the list can be loaded
    with a load_<name> function, where
    <name> is the name of the dataset.

    Returns:
        Returns a list of the available datasets.
    """
    return _datasets[:]

def load_iris():
    """ Load iris dataset.

    Loads the iris dataset as a Pandas
    DataFrame.

    Iris dataset: https://archive.ics.uci.edu/ml/datasets/iris

    Returns:
        Iris dataset as a Pandas DataFrame.
    """
    compressed = pkgutil.get_data('apoor.data', '_data/iris.csv.gz')
    df = _decompress(compressed)

    df["target"] = df["target"].astype("category")
    return df

def load_boston():
    """Load boston housing dataset.

    Loads the boston housing dataset as a Pandas
    DataFrame.

    Boston Housing dataset: https://www.cs.toronto.edu/~delve/data/boston/bostonDetail.html

    Returns:
        Boston Housing dataset as a Pandas DataFrame.
    """
    compressed = pkgutil.get_data('apoor.data', '_data/boston.csv.gz')
    df = _decompress(compressed)

    df.CHAS = df.CHAS.astype("int8")
    df.MEDV = df.MEDV.astype("int32")
    return df





1			"""Dataset functions.
2
3			Includes functions for loading common datasets as pandas DataFrames.
4			"""
5
6			import io
7			import gzip
8			import pkgutil
9
10			import pandas as pd
11
12			_datasets = [
13			"iris",
14			"boston"
15			]
16
17			def _decompress(bstr: bytes):
18			"""CSV gzip decompression helper fn.
19
20			Helper function for decompressing
21			a gzip-ed CSV dataset and converting
22			it to a pandas DataFrame.
23
24			Args:
25			bstr:
			0 ignored issues – show Coding Style introduced 2020-08-20 19:50 UTC by Report Bug Copy Issue Report Trailing whitespace Loading history...
26			Binary string of a CSV with gzip compression.
			0 ignored issues – show Coding Style introduced 2020-08-20 19:50 UTC by Report Bug Copy Issue Report Trailing whitespace Loading history...
27			Returns:
28			Pandas DataFrame from compressed dataset.
29			"""
30			decomp = gzip.decompress(bstr).decode()
31			f = io.StringIO(decomp)
			0 ignored issues – show Coding Style Naming introduced 2020-08-20 18:55 UTC by Report Bug Copy Issue Report Variable name "f" doesn't conform to snake_case naming style ('([^\\W\\dA-Z][^\\WA-Z]2,\|_[^\\WA-Z]*\|__[^\\WA-Z\\d_][^\\WA-Z]+__)$' pattern) This check looks for invalid names for a range of different identifiers. You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements. If your project includes a Pylint configuration file, the settings contained in that file take precedence. To find out more about Pylint, please refer to their site. Loading history...
32			return pd.read_csv(f,encoding="utf-8")
			0 ignored issues – show Coding Style introduced 2020-08-20 18:55 UTC by Report Bug Copy Issue Report Exactly one space required after comma Loading history...
33
34			def list_datasets():
35			"""Get available datasets.
36
			0 ignored issues – show Coding Style introduced 2020-08-20 18:55 UTC by Report Bug Copy Issue Report Trailing whitespace Loading history...
37			Each dataset in the list can be loaded
38			with a load_<name> function, where
39			<name> is the name of the dataset.
40
41			Returns:
42			Returns a list of the available datasets.
43			"""
44			return _datasets[:]
45
46			def load_iris():
47			""" Load iris dataset.
48
49			Loads the iris dataset as a Pandas
50			DataFrame.
51
52			Iris dataset: https://archive.ics.uci.edu/ml/datasets/iris
53
54			Returns:
55			Iris dataset as a Pandas DataFrame.
56			"""
57			compressed = pkgutil.get_data('apoor.data', '_data/iris.csv.gz')
58			df = _decompress(compressed)
			0 ignored issues – show Coding Style Naming introduced 2020-08-20 18:55 UTC by Report Bug Copy Issue Report Variable name "df" doesn't conform to snake_case naming style ('([^\\W\\dA-Z][^\\WA-Z]2,\|_[^\\WA-Z]*\|__[^\\WA-Z\\d_][^\\WA-Z]+__)$' pattern) This check looks for invalid names for a range of different identifiers. You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements. If your project includes a Pylint configuration file, the settings contained in that file take precedence. To find out more about Pylint, please refer to their site. Loading history...
59			df["target"] = df["target"].astype("category")
60			return df
61
62			def load_boston():
63			"""Load boston housing dataset.
64
65			Loads the boston housing dataset as a Pandas
66			DataFrame.
67
68			Boston Housing dataset: https://www.cs.toronto.edu/~delve/data/boston/bostonDetail.html
69
70			Returns:
71			Boston Housing dataset as a Pandas DataFrame.
72			"""
73			compressed = pkgutil.get_data('apoor.data', '_data/boston.csv.gz')
74			df = _decompress(compressed)
			0 ignored issues – show Coding Style Naming introduced 2020-08-20 18:55 UTC by Report Bug Copy Issue Report Variable name "df" doesn't conform to snake_case naming style ('([^\\W\\dA-Z][^\\WA-Z]2,\|_[^\\WA-Z]*\|__[^\\WA-Z\\d_][^\\WA-Z]+__)$' pattern) This check looks for invalid names for a range of different identifiers. You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements. If your project includes a Pylint configuration file, the settings contained in that file take precedence. To find out more about Pylint, please refer to their site. Loading history...
75			df.CHAS = df.CHAS.astype("int8")
76			df.MEDV = df.MEDV.astype("int32")
77			return df
78
79
			0 ignored issues – show coding-style introduced 2020-08-20 18:55 UTC by Report Bug Copy Issue Report Trailing newlines Loading history...
80

a-poor / apoor

apoor.data A last analyzed 2021-01-18 17:36 UTC

Complexity

Size/Duplication

Importance

4 Functions

Duplication Side-by-Side

Filter issues like

apoor.data A
last analyzed 2021-01-18 17:36 UTC