Passed
Pull Request — master (#14)
by Shlomi
02:38 queued 34s
created

ethically.dataset   A

Complexity

Total Complexity 0

Size/Duplication

Total Lines 49
Duplicated Lines 0 %

Importance

Changes 0
Metric Value
eloc 6
dl 0
loc 49
rs 10
c 0
b 0
f 0
wmc 0
1
"""
2
Collection of common benchmark datasets from fairness research.
3
4
Each dataset object contains a `pandas.DataFrame` as `df` attribute
5
that holds the actual data.
6
The dataset object will take care of loading, preprocessing
7
and validating the data.
8
The preprocessing is done by standard practices that are associated with
9
this data set: from its manual (e.g., README)
10
or as other did in the literature.
11
12
See :class:`ethically.dataset.Dataset`
13
for additional attribute and complete documentation.
14
15
Currently these are the available datasets:
16
    - ProPublica recidivism/COMPAS dataset,
17
      see: :class:`~ethically.dataset.COMPASDataset`
18
19
    - Adult dataset, see: :class:`~ethically.dataset.AdultDataset`
20
21
    - German credit dataset, see: :class:`~ethically.dataset.GermanDataset`
22
23
Usage
24
-----
25
.. code:: python
26
27
    >>> from ethically.dataset import COMPASDataset
28
    >>> compas_ds = COMPASDataset()
29
    >>> print(compas_ds)
30
    <ProPublica Recidivism/COMPAS Dataset. 6172 rows, 56 columns in
31
    which {race, sex} are sensitive attributes>
32
    >>> type(compas_ds.df)
33
    <class 'pandas.core.frame.DataFrame'>
34
    >>> compas_ds.df['race'].value_counts()
35
    African-American    3175
36
    Caucasian           2103
37
    Hispanic             509
38
    Other                343
39
    Asian                 31
40
    Native American       11
41
    Name: race, dtype: int64
42
"""
43
44
from ethically.dataset.adult import AdultDataset
45
from ethically.dataset.compas import COMPASDataset
46
from ethically.dataset.core import Dataset
47
from ethically.dataset.fico import build_FICO_dataset
48
from ethically.dataset.german import GermanDataset
49