Passed
Pull Request — dev (#30)
by Shlomi
02:40
created

responsibly.we.data._extract_caliskan   A

Complexity

Total Complexity 2

Size/Duplication

Total Lines 197
Duplicated Lines 0 %

Importance

Changes 0
Metric Value
eloc 69
dl 0
loc 197
rs 10
c 0
b 0
f 0
wmc 2

2 Functions

Rating   Name   Duplication   Size   Complexity  
A parse_line() 0 3 1
A parse_case() 0 6 1
1
# pylint: skip-file
2
3
import json
4
5
6
TEXT = '''WEAT_DATA 1: We use the flower and insect target words along with pleasant and unpleasant attributes
7
found in (5).
8
• Flowers: aster, clover, hyacinth, marigold, poppy, azalea, crocus, iris, orchid, rose, bluebell,
9
daffodil, lilac, pansy, tulip, buttercup, daisy, lily, peony, violet, carnation, gladiola,
10
magnolia, petunia, zinnia.
11
• Insects: ant, caterpillar, flea, locust, spider, bedbug, centipede, fly, maggot, tarantula,
12
bee, cockroach, gnat, mosquito, termite, beetle, cricket, hornet, moth, wasp, blackfly,
13
dragonfly, horsefly, roach, weevil.
14
• Pleasant: caress, freedom, health, love, peace, cheer, friend, heaven, loyal, pleasure, diamond,
15
gentle, honest, lucky, rainbow, diploma, gift, honor, miracle, sunrise, family,
16
happy, laughter, paradise, vacation.
17
• Unpleasant: abuse, crash, filth, murder, sickness, accident, death, grief, poison, stink,
18
assault, disaster, hatred, pollute, tragedy, divorce, jail, poverty, ugly, cancer, kill, rotten,
19
vomit, agony, prison.
20
WEAT_DATA 2: We use the musical instruments and weapons target words along with pleasant and
21
unpleasant attributes found in (5).
22
• Instruments: bagpipe, cello, guitar, lute, trombone, banjo, clarinet, harmonica, mandolin,
23
trumpet, bassoon, drum, harp, oboe, tuba, bell, fiddle, harpsichord, piano, viola, bongo,
24
flute, horn, saxophone, violin.
25
• Weapons: arrow, club, gun, missile, spear, axe, dagger, harpoon, pistol, sword, blade,
26
dynamite, hatchet, rifle, tank, bomb, firearm, knife, shotgun, teargas, cannon, grenade,
27
mace, slingshot, whip.
28
• Pleasant: As per previous experiment with insects and flowers.
29
• Unpleasant: As per previous experiment with insects and flowers.
30
WEAT_DATA 3: We use the European American and African American names along with pleasant
31
and unpleasant attributes found in (5). Names that are marked with italics are excluded from
32
our replication. In the case of African American names this was due to being to infrequent to
33
occur in GloVe’s Common Crawl corpus; in the case of European American names an equal
34
number were deleted, chosen at random.
35
• European American names: Adam, Chip, Harry, Josh, Roger, Alan, Frank, Ian, Justin,
36
Ryan, Andrew, Fred, Jack, Matthew, Stephen, Brad, Greg, Jed, Paul, Todd, Brandon,
37
Hank, Jonathan, Peter, Wilbur, Amanda, Courtney, Heather, Melanie, Sara, Amber, Crystal,
38
Katie, Meredith, Shannon, Betsy, Donna, Kristin, Nancy, Stephanie, Bobbie-Sue,
39
Ellen, Lauren, Peggy, Sue-Ellen, Colleen, Emily, Megan, Rachel, Wendy.
40
• African American names: Alonzo, Jamel, Lerone, Percell, Theo, Alphonse, Jerome,
41
Leroy, Rasaan, Torrance, Darnell, Lamar, Lionel, Rashaun, Tyree, Deion, Lamont, Malik,
42
Terrence, Tyrone, Everol, Lavon, Marcellus, Terryl, Wardell, Aiesha, Lashelle, Nichelle,
43
Shereen, Temeka, Ebony, Latisha, Shaniqua, Tameisha, Teretha, Jasmine, Latonya, Shanise,
44
Tanisha, Tia, Lakisha, Latoya, Sharise, Tashika, Yolanda, Lashandra, Malika, Shavonn,
45
Tawanda, Yvette.
46
• Pleasant: caress, freedom, health, love, peace, cheer, friend, heaven, loyal, pleasure, diamond,
47
gentle, honest, lucky, rainbow, diploma, gift, honor, miracle, sunrise, family,
48
happy, laughter, paradise, vacation.
49
• Unpleasant: abuse, crash, filth, murder, sickness, accident, death, grief, poison, stink,
50
assault, disaster, hatred, pollute, tragedy, bomb, divorce, jail, poverty, ugly, cancer, evil,
51
kill, rotten, vomit.
52
WEAT_DATA 4: We use the European American and African American names from (7), along with
53
pleasant and unpleasant attributes found in (5).
54
• European American names: Brad, Brendan, Geoffrey, Greg, Brett, Jay, Matthew, Neil,
55
Todd, Allison, Anne, Carrie, Emily, Jill, Laurie, Kristen, Meredith, Sarah.
56
• African American names: Darnell, Hakim, Jermaine, Kareem, Jamal, Leroy, Rasheed,
57
Tremayne, Tyrone, Aisha, Ebony, Keisha, Kenya, Latonya, Lakisha, Latoya, Tamika,
58
Tanisha.
59
• Pleasant: caress, freedom, health, love, peace, cheer, friend, heaven, loyal, pleasure, diamond,
60
gentle, honest, lucky, rainbow, diploma, gift, honor, miracle, sunrise, family,
61
happy, laughter, paradise, vacation.
62
• Unpleasant: abuse, crash, filth, murder, sickness, accident, death, grief, poison, stink,
63
assault, disaster, hatred, pollute, tragedy, bomb, divorce, jail, poverty, ugly, cancer, evil,
64
kill, rotten, vomit.
65
WEAT_DATA 5: We use the European American and African American names from (7), along with
66
pleasant and unpleasant attributes found in (9).
67
• European American names: Brad, Brendan, Geoffrey, Greg, Brett, Jay, Matthew, Neil,
68
Todd, Allison, Anne, Carrie, Emily, Jill, Laurie, Kristen, Meredith, Sarah.
69
• African American names: Darnell, Hakim, Jermaine, Kareem, Jamal, Leroy, Rasheed,
70
Tremayne, Tyrone, Aisha, Ebony, Keisha, Kenya, Latonya, Lakisha, Latoya, Tamika,
71
Tanisha.
72
• Pleasant: joy, love, peace, wonderful, pleasure, friend, laughter, happy.
73
• Unpleasant: agony, terrible, horrible, nasty, evil, war, awful, failure.
74
WEAT_DATA 6: We use the male and female names along with career and family attributes found
75
in (9).
76
• Male names: John, Paul, Mike, Kevin, Steve, Greg, Jeff, Bill.
77
• Female names: Amy, Joan, Lisa, Sarah, Diana, Kate, Ann, Donna.
78
• Career: executive, management, professional, corporation, salary, office, business, career.
79
• Family: home, parents, children, family, cousins, marriage, wedding, relatives.
80
WEAT_DATA 7: We use the math and arts target words along with male and female attributes found
81
in (9).
82
• Math: math, algebra, geometry, calculus, equations, computation, numbers, addition.
83
• Arts: poetry, art, dance, literature, novel, symphony, drama, sculpture.
84
• Male terms: male, man, boy, brother, he, him, his, son.
85
• Female terms: female, woman, girl, sister, she, her, hers, daughter.
86
WEAT_DATA 8: We use the science and arts target words along with male and female attributes
87
found in (10).
88
• Science: science, technology, physics, chemistry, Einstein, NASA, experiment, astronomy.
89
• Arts: poetry, art, Shakespeare, dance, literature, novel, symphony, drama.
90
• Male terms: brother, father, uncle, grandfather, son, he, his, him.
91
• Female terms: sister, mother, aunt, grandmother, daughter, she, hers, her.
92
WEAT_DATA 9: We use the mental and physical disease target words along with uncontrollability
93
and controllability attributes found in (23).
94
• Mental disease: sad, hopeless, gloomy, tearful, miserable, depressed.
95
• Physical disease: sick, illness, influenza, disease, virus, cancer.
96
• Temporary: impermanent, unstable, variable, fleeting, short-term, brief, occasional.
97
• Permanent: stable, always, constant, persistent, chronic, prolonged, forever.
98
WEAT_DATA 10: We use young and old people’s names as target words along with pleasant and
99
unpleasant attributes found in (9).
100
• Young people’s names: Tiffany, Michelle, Cindy, Kristy, Brad, Eric, Joey, Billy.
101
• Old people’s names: Ethel, Bernice, Gertrude, Agnes, Cecil, Wilbert, Mortimer, Edgar.
102
• Pleasant: joy, love, peace, wonderful, pleasure, friend, laughter, happy.
103
• Unpleasant: agony, terrible, horrible, nasty, evil, war, awful, failure.'''
104
105
106
def parse_line(line):
107
    name, words_str = line[1:].split(': ')
108
    return {'name': name, 'words': words_str[:-2].split(', ')}
109
110
111
def parse_case(case):
112
    groups_str = case.replace('\n', ' ').split('•')[1:]
113
    return {'first_target': parse_line(groups_str[0]),
114
            'second_target': parse_line(groups_str[1]),
115
            'first_attribute': parse_line(groups_str[2]),
116
            'second_attribute': parse_line(groups_str[3])}
117
118
119
cases = TEXT.split('WEAT_DATA')[1:]
120
121
WEAT_DATA = [parse_case(case) for case in cases]
122
123
WEAT_DATA[1]['first_attribute']['words'] = WEAT_DATA[0]['first_attribute']['words']
124
WEAT_DATA[1]['second_attribute']['words'] = WEAT_DATA[0]['second_attribute']['words']
125
126
WEAT_DATA[2]['first_target']['remove'] = ['Chip', 'Ian', 'Fred', 'Jed', 'Todd', 'Brandon', 'Hank', 'Wilbur', 'Sara', 'Amber', 'Crystal', 'Meredith', 'Shannon', 'Donna', 'Bobbie-Sue', 'Peggy', 'Sue-Ellen', 'Wendy']
127
WEAT_DATA[2]['second_target']['remove'] = ['Lerone', 'Percell', 'Rasaan', 'Rashaun', 'Everol', 'Terryl', 'Aiesha', 'Lashelle', 'Temeka', 'Tameisha', 'Teretha', 'Latonya', 'Shanise', 'Sharise', 'Tashika', 'Lashandra', 'Shavonn', 'Tawanda']
128
129
assert len(WEAT_DATA[2]['first_target']['remove']) == len(WEAT_DATA[2]['second_target']['remove'])
130
assert set(WEAT_DATA[2]['first_target']['remove']).issubset(WEAT_DATA[2]['first_target']['words'])
131
132
WEAT_DATA[3]['first_target']['remove'] = ['Jay', 'Kristen']
133
WEAT_DATA[3]['second_target']['remove'] = ['Tremayne', 'Latonya']
134
135
assert len(WEAT_DATA[3]['first_target']['remove']) == len(WEAT_DATA[3]['second_target']['remove'])
136
assert set(WEAT_DATA[3]['first_target']['remove']).issubset(WEAT_DATA[3]['first_target']['words'])
137
138
WEAT_DATA[4]['first_target']['remove'] = ['Jay', 'Kristen']
139
WEAT_DATA[4]['second_target']['remove'] = ['Tremayne', 'Latonya']
140
141
assert len(WEAT_DATA[4]['first_target']['remove']) == len(WEAT_DATA[4]['second_target']['remove'])
142
assert set(WEAT_DATA[4]['first_target']['remove']).issubset(WEAT_DATA[4]['first_target']['words'])
143
144
145
WEAT_DATA[0]['original_finding'] = {'Ref': 'A. G. Greenwald, D. E. McGhee, J. L. Schwartz, Measuring individual differences in im- plicit cognition: the implicit association test., Journal of personality and social psychology 74, 1464 (1998).',
146
                                    'N': '32',
147
                                    'd': '1.35',
148
                                    'p': '1e-8'}
149
150
WEAT_DATA[1]['original_finding'] = {'Ref': 'A. G. Greenwald, D. E. McGhee, J. L. Schwartz, Measuring individual differences in im- plicit cognition: the implicit association test., Journal of personality and social psychology 74, 1464 (1998).',
151
                                    'N': '32',
152
                                    'd': '1.66',
153
                                    'p': '1e-10'}
154
155
WEAT_DATA[2]['original_finding'] = {'Ref': 'A. G. Greenwald, D. E. McGhee, J. L. Schwartz, Measuring individual differences in im- plicit cognition: the implicit association test., Journal of personality and social psychology 74, 1464 (1998).',
156
                                    'N': '26',
157
                                    'd': '1.17',
158
                                    'p': '1e-5'}
159
160
WEAT_DATA[3]['original_finding'] = {'Ref': 'M. Bertrand, S. Mullainathan, Are Emily and Greg more employable than Lakisha and Jamal? a field experiment on labor market discrimination, The American Economic Review 94, 991 (2004).',
161
                                    'N': '',
162
                                    'd': '',
163
                                    'p': ''}
164
165
WEAT_DATA[4]['original_finding'] = {'Ref': 'M. Bertrand, S. Mullainathan, Are Emily and Greg more employable than Lakisha and Jamal? a field experiment on labor market discrimination, The American Economic Review 94, 991 (2004).',
166
                                    'N': '',
167
                                    'd': '',
168
                                    'p': ''}
169
170
WEAT_DATA[5]['original_finding'] = {'Ref': 'B. A. Nosek, M. Banaji, A. G. Greenwald, Harvesting implicit group attitudes and beliefs from a demonstration web site., Group Dynamics: Theory, Research, and Practice 6, 101 (2002).',
171
                                    'N': '39k',
172
                                    'd': '0.72',
173
                                    'p': '< 1e-2'}
174
175
WEAT_DATA[6]['original_finding'] = {'Ref': 'B. A. Nosek, M. Banaji, A. G. Greenwald, Harvesting implicit group attitudes and beliefs from a demonstration web site., Group Dynamics: Theory, Research, and Practice 6, 101 (2002).',
176
                                    'N': '28k',
177
                                    'd': '0.82',
178
                                    'p': '< 1e-2'}
179
180
WEAT_DATA[7]['original_finding'] = {'Ref': 'B. A. Nosek, M. R. Banaji, A. G. Greenwald, Math=male, me=female, therefore math̸=me., Journal of Personality and Social Psychology 83, 44 (2002).',
181
                                    'N': '91',
182
                                    'd': '1.47',
183
                                    'p': '1e-24'}
184
185
WEAT_DATA[8]['original_finding'] = {'Ref': 'P. D. Turney, P. Pantel, From frequency to meaning: Vector space models of semantics, Journal of Artificial Intelligence Research 37, 141 (2010).',
186
                                    'N': '135',
187
                                    'd': '1.01',
188
                                    'p': '1e-3'}
189
190
191
WEAT_DATA[9]['original_finding'] = {'Ref': 'B. A. Nosek, M. Banaji, A. G. Greenwald, Harvesting implicit group attitudes and beliefs from a demonstration web site., Group Dynamics: Theory, Research, and Practice 6, 101 (2002).',
192
                                    'N': '43k',
193
                                    'd': '1.42',
194
                                    'p': '< 1e-2'}
195
196
json.dump(WEAT_DATA, open('weat.json', 'w'), indent=True)
197