1
|
|
|
""" |
2
|
|
|
Collection of common benchmark datasets from fairness research. |
3
|
|
|
|
4
|
|
|
Each dataset object contains a `pandas.DataFrame` as `df` attribute |
5
|
|
|
that holds the actual data. |
6
|
|
|
The dataset object will take care of loading, preprocessing |
7
|
|
|
and validating the data. |
8
|
|
|
The preprocessing is done by standard practices that are associated with |
9
|
|
|
this data set: from its manual (e.g., README) |
10
|
|
|
or as other did in the literature. |
11
|
|
|
|
12
|
|
|
See :class:`ethically.dataset.Dataset` |
13
|
|
|
for additional attribute and complete documentation. |
14
|
|
|
|
15
|
|
|
Currently these are the available datasets: |
16
|
|
|
- ProPublica recidivism/COMPAS dataset, |
17
|
|
|
see: :class:`~ethically.dataset.COMPASDataset` |
18
|
|
|
|
19
|
|
|
- Adult dataset, see: :class:`~ethically.dataset.AdultDataset` |
20
|
|
|
|
21
|
|
|
- German credit dataset, see: :class:`~ethically.dataset.GermanDataset` |
22
|
|
|
|
23
|
|
|
Usage |
24
|
|
|
----- |
25
|
|
|
.. code:: python |
26
|
|
|
|
27
|
|
|
>>> from ethically.dataset import COMPASDataset |
28
|
|
|
>>> compas_ds = COMPASDataset() |
29
|
|
|
>>> print(compas_ds) |
30
|
|
|
<ProPublica Recidivism/COMPAS Dataset. 6172 rows, 56 columns in |
31
|
|
|
which {race, sex} are sensitive attributes> |
32
|
|
|
>>> type(compas_ds.df) |
33
|
|
|
<class 'pandas.core.frame.DataFrame'> |
34
|
|
|
>>> compas_ds.df['race'].value_counts() |
35
|
|
|
African-American 3175 |
36
|
|
|
Caucasian 2103 |
37
|
|
|
Hispanic 509 |
38
|
|
|
Other 343 |
39
|
|
|
Asian 31 |
40
|
|
|
Native American 11 |
41
|
|
|
Name: race, dtype: int64 |
42
|
|
|
""" |
43
|
|
|
|
44
|
|
|
from ethically.dataset.adult import AdultDataset |
45
|
|
|
from ethically.dataset.compas import COMPASDataset |
46
|
|
|
from ethically.dataset.core import Dataset |
47
|
|
|
from ethically.dataset.fico import build_FICO_dataset |
48
|
|
|
from ethically.dataset.german import GermanDataset |
49
|
|
|
|