Passed
Push — master ( 170db5...8af2aa )
by Shlomi
02:43 queued 58s
created

ethically.we.data._extract_caliskan   A

Complexity

Total Complexity 2

Size/Duplication

Total Lines 200
Duplicated Lines 0 %

Importance

Changes 0
Metric Value
eloc 72
dl 0
loc 200
rs 10
c 0
b 0
f 0
wmc 2

2 Functions

Rating   Name   Duplication   Size   Complexity  
A parse_case() 0 6 1
A parse_line() 0 3 1
1
# pylint: skip-file
2
3
import json
4
5
6
TEXT = '''WEAT_DATA 1: We use the flower and insect target words along with pleasant and unpleasant attributes
7
found in (5).
8
• Flowers: aster, clover, hyacinth, marigold, poppy, azalea, crocus, iris, orchid, rose, bluebell,
9
daffodil, lilac, pansy, tulip, buttercup, daisy, lily, peony, violet, carnation, gladiola,
10
magnolia, petunia, zinnia.
11
• Insects: ant, caterpillar, flea, locust, spider, bedbug, centipede, fly, maggot, tarantula,
12
bee, cockroach, gnat, mosquito, termite, beetle, cricket, hornet, moth, wasp, blackfly,
13
dragonfly, horsefly, roach, weevil.
14
• Pleasant: caress, freedom, health, love, peace, cheer, friend, heaven, loyal, pleasure, diamond,
15
gentle, honest, lucky, rainbow, diploma, gift, honor, miracle, sunrise, family,
16
happy, laughter, paradise, vacation.
17
• Unpleasant: abuse, crash, filth, murder, sickness, accident, death, grief, poison, stink,
18
assault, disaster, hatred, pollute, tragedy, divorce, jail, poverty, ugly, cancer, kill, rotten,
19
vomit, agony, prison.
20
WEAT_DATA 2: We use the musical instruments and weapons target words along with pleasant and
21
unpleasant attributes found in (5).
22
• Instruments: bagpipe, cello, guitar, lute, trombone, banjo, clarinet, harmonica, mandolin,
23
trumpet, bassoon, drum, harp, oboe, tuba, bell, fiddle, harpsichord, piano, viola, bongo,
24
flute, horn, saxophone, violin.
25
• Weapons: arrow, club, gun, missile, spear, axe, dagger, harpoon, pistol, sword, blade,
26
dynamite, hatchet, rifle, tank, bomb, firearm, knife, shotgun, teargas, cannon, grenade,
27
mace, slingshot, whip.
28
• Pleasant: As per previous experiment with insects and flowers.
29
• Unpleasant: As per previous experiment with insects and flowers.
30
WEAT_DATA 3: We use the European American and African American names along with pleasant
31
and unpleasant attributes found in (5). Names that are marked with italics are excluded from
32
our replication. In the case of African American names this was due to being to infrequent to
33
occur in GloVe’s Common Crawl corpus; in the case of European American names an equal
34
number were deleted, chosen at random.
35
• European American names: Adam, Chip, Harry, Josh, Roger, Alan, Frank, Ian, Justin,
36
Ryan, Andrew, Fred, Jack, Matthew, Stephen, Brad, Greg, Jed, Paul, Todd, Brandon,
37
Hank, Jonathan, Peter, Wilbur, Amanda, Courtney, Heather, Melanie, Sara, Amber, Crystal,
38
Katie, Meredith, Shannon, Betsy, Donna, Kristin, Nancy, Stephanie, Bobbie-Sue,
39
Ellen, Lauren, Peggy, Sue-Ellen, Colleen, Emily, Megan, Rachel, Wendy.
40
• African American names: Alonzo, Jamel, Lerone, Percell, Theo, Alphonse, Jerome,
41
Leroy, Rasaan, Torrance, Darnell, Lamar, Lionel, Rashaun, Tyree, Deion, Lamont, Malik,
42
Terrence, Tyrone, Everol, Lavon, Marcellus, Terryl, Wardell, Aiesha, Lashelle, Nichelle,
43
Shereen, Temeka, Ebony, Latisha, Shaniqua, Tameisha, Teretha, Jasmine, Latonya, Shanise,
44
Tanisha, Tia, Lakisha, Latoya, Sharise, Tashika, Yolanda, Lashandra, Malika, Shavonn,
45
Tawanda, Yvette.
46
• Pleasant: caress, freedom, health, love, peace, cheer, friend, heaven, loyal, pleasure, diamond,
47
gentle, honest, lucky, rainbow, diploma, gift, honor, miracle, sunrise, family,
48
happy, laughter, paradise, vacation.
49
• Unpleasant: abuse, crash, filth, murder, sickness, accident, death, grief, poison, stink,
50
assault, disaster, hatred, pollute, tragedy, bomb, divorce, jail, poverty, ugly, cancer, evil,
51
kill, rotten, vomit.
52
WEAT_DATA 4: We use the European American and African American names from (7), along with
53
pleasant and unpleasant attributes found in (5).
54
• European American names: Brad, Brendan, Geoffrey, Greg, Brett, Jay, Matthew, Neil,
55
Todd, Allison, Anne, Carrie, Emily, Jill, Laurie, Kristen, Meredith, Sarah.
56
• African American names: Darnell, Hakim, Jermaine, Kareem, Jamal, Leroy, Rasheed,
57
Tremayne, Tyrone, Aisha, Ebony, Keisha, Kenya, Latonya, Lakisha, Latoya, Tamika,
58
Tanisha.
59
• Pleasant: caress, freedom, health, love, peace, cheer, friend, heaven, loyal, pleasure, diamond,
60
gentle, honest, lucky, rainbow, diploma, gift, honor, miracle, sunrise, family,
61
happy, laughter, paradise, vacation.
62
• Unpleasant: abuse, crash, filth, murder, sickness, accident, death, grief, poison, stink,
63
assault, disaster, hatred, pollute, tragedy, bomb, divorce, jail, poverty, ugly, cancer, evil,
64
kill, rotten, vomit.
65
WEAT_DATA 5: We use the European American and African American names from (7), along with
66
pleasant and unpleasant attributes found in (9).
67
• European American names: Brad, Brendan, Geoffrey, Greg, Brett, Jay, Matthew, Neil,
68
Todd, Allison, Anne, Carrie, Emily, Jill, Laurie, Kristen, Meredith, Sarah.
69
• African American names: Darnell, Hakim, Jermaine, Kareem, Jamal, Leroy, Rasheed,
70
Tremayne, Tyrone, Aisha, Ebony, Keisha, Kenya, Latonya, Lakisha, Latoya, Tamika,
71
Tanisha.
72
• Pleasant: joy, love, peace, wonderful, pleasure, friend, laughter, happy.
73
• Unpleasant: agony, terrible, horrible, nasty, evil, war, awful, failure.
74
WEAT_DATA 6: We use the male and female names along with career and family attributes found
75
in (9).
76
• Male names: John, Paul, Mike, Kevin, Steve, Greg, Jeff, Bill.
77
• Female names: Amy, Joan, Lisa, Sarah, Diana, Kate, Ann, Donna.
78
• Career: executive, management, professional, corporation, salary, office, business, career.
79
• Family: home, parents, children, family, cousins, marriage, wedding, relatives.
80
WEAT_DATA 7: We use the math and arts target words along with male and female attributes found
81
in (9).
82
• Math: math, algebra, geometry, calculus, equations, computation, numbers, addition.
83
• Arts: poetry, art, dance, literature, novel, symphony, drama, sculpture.
84
• Male terms: male, man, boy, brother, he, him, his, son.
85
• Female terms: female, woman, girl, sister, she, her, hers, daughter.
86
WEAT_DATA 8: We use the science and arts target words along with male and female attributes
87
found in (10).
88
• Science: science, technology, physics, chemistry, Einstein, NASA, experiment, astronomy.
89
• Arts: poetry, art, Shakespeare, dance, literature, novel, symphony, drama.
90
• Male terms: brother, father, uncle, grandfather, son, he, his, him.
91
• Female terms: sister, mother, aunt, grandmother, daughter, she, hers, her.
92
WEAT_DATA 9: We use the mental and physical disease target words along with uncontrollability
93
and controllability attributes found in (23).
94
• Mental disease: sad, hopeless, gloomy, tearful, miserable, depressed.
95
• Physical disease: sick, illness, influenza, disease, virus, cancer.
96
• Temporary: impermanent, unstable, variable, fleeting, short-term, brief, occasional.
97
• Permanent: stable, always, constant, persistent, chronic, prolonged, forever.
98
WEAT_DATA 10: We use young and old people’s names as target words along with pleasant and
99
unpleasant attributes found in (9).
100
• Young people’s names: Tiffany, Michelle, Cindy, Kristy, Brad, Eric, Joey, Billy.
101
• Old people’s names: Ethel, Bernice, Gertrude, Agnes, Cecil, Wilbert, Mortimer, Edgar.
102
• Pleasant: joy, love, peace, wonderful, pleasure, friend, laughter, happy.
103
• Unpleasant: agony, terrible, horrible, nasty, evil, war, awful, failure.'''
104
105
106
def parse_line(line):
107
    name, words_str = line[1:].split(': ')
108
    return {'name': name, 'words': words_str[:-2].split(', ')}
109
110
111
def parse_case(case):
112
    groups_str = case.replace('\n', ' ').split('•')[1:]
113
    return {'first_target': parse_line(groups_str[0]),
114
            'second_target': parse_line(groups_str[1]),
115
            'first_attribute': parse_line(groups_str[2]),
116
            'second_attribute': parse_line(groups_str[3])}
117
118
119
cases = TEXT.split('WEAT_DATA')[1:]
120
121
WEAT_DATA = [parse_case(case) for case in cases]
122
123
WEAT_DATA[1]['first_attribute']['words'] = WEAT_DATA[0]['first_attribute']['words']
124
WEAT_DATA[1]['second_attribute']['words'] = WEAT_DATA[0]['second_attribute']['words']
125
126
WEAT_DATA[2]['first_target']['remove'] = ['Chip', 'Ian', 'Fred', 'Jed', 'Todd', 'Brandon', 'Hank', 'Wilbur', 'Sara', 'Amber', 'Crystal', 'Meredith', 'Shannon', 'Donna', 'Bobbie-Sue', 'Peggy', 'Sue-Ellen', 'Wendy']
127
WEAT_DATA[2]['second_target']['remove'] = ['Lerone', 'Percell', 'Rasaan', 'Rashaun', 'Everol', 'Terryl', 'Aiesha', 'Lashelle', 'Temeka', 'Tameisha', 'Teretha', 'Latonya', 'Shanise', 'Sharise', 'Tashika', 'Lashandra', 'Shavonn', 'Tawanda']
128
129
print(len(WEAT_DATA[2]['first_target']['remove']), len(WEAT_DATA[2]['second_target']['remove']))
130
assert len(WEAT_DATA[2]['first_target']['remove']) == len(WEAT_DATA[2]['second_target']['remove'])
131
assert set(WEAT_DATA[2]['first_target']['remove']).issubset(WEAT_DATA[2]['first_target']['words'])
132
133
WEAT_DATA[3]['first_target']['remove'] = ['Jay', 'Kristen']
134
WEAT_DATA[3]['second_target']['remove'] = ['Tremayne', 'Latonya']
135
136
print(len(WEAT_DATA[3]['first_target']['remove']), len(WEAT_DATA[3]['second_target']['remove']))
137
assert len(WEAT_DATA[3]['first_target']['remove']) == len(WEAT_DATA[3]['second_target']['remove'])
138
assert set(WEAT_DATA[3]['first_target']['remove']).issubset(WEAT_DATA[3]['first_target']['words'])
139
140
WEAT_DATA[4]['first_target']['remove'] = ['Jay', 'Kristen']
141
WEAT_DATA[4]['second_target']['remove'] = ['Tremayne', 'Latonya']
142
143
print(len(WEAT_DATA[4]['first_target']['remove']), len(WEAT_DATA[4]['second_target']['remove']))
144
assert len(WEAT_DATA[4]['first_target']['remove']) == len(WEAT_DATA[4]['second_target']['remove'])
145
assert set(WEAT_DATA[4]['first_target']['remove']).issubset(WEAT_DATA[4]['first_target']['words'])
146
147
148
WEAT_DATA[0]['original_finding'] = {'Ref': 'A. G. Greenwald, D. E. McGhee, J. L. Schwartz, Measuring individual differences in im- plicit cognition: the implicit association test., Journal of personality and social psychology 74, 1464 (1998).',
149
                                    'N': '32',
150
                                    'd': '1.35',
151
                                    'p': '1e-8'}
152
153
WEAT_DATA[1]['original_finding'] = {'Ref': 'A. G. Greenwald, D. E. McGhee, J. L. Schwartz, Measuring individual differences in im- plicit cognition: the implicit association test., Journal of personality and social psychology 74, 1464 (1998).',
154
                                    'N': '32',
155
                                    'd': '1.66',
156
                                    'p': '1e-10'}
157
158
WEAT_DATA[2]['original_finding'] = {'Ref': 'A. G. Greenwald, D. E. McGhee, J. L. Schwartz, Measuring individual differences in im- plicit cognition: the implicit association test., Journal of personality and social psychology 74, 1464 (1998).',
159
                                    'N': '26',
160
                                    'd': '1.17',
161
                                    'p': '1e-5'}
162
163
WEAT_DATA[3]['original_finding'] = {'Ref': 'M. Bertrand, S. Mullainathan, Are Emily and Greg more employable than Lakisha and Jamal? a field experiment on labor market discrimination, The American Economic Review 94, 991 (2004).',
164
                                    'N': '',
165
                                    'd': '',
166
                                    'p': ''}
167
168
WEAT_DATA[4]['original_finding'] = {'Ref': 'M. Bertrand, S. Mullainathan, Are Emily and Greg more employable than Lakisha and Jamal? a field experiment on labor market discrimination, The American Economic Review 94, 991 (2004).',
169
                                    'N': '',
170
                                    'd': '',
171
                                    'p': ''}
172
173
WEAT_DATA[5]['original_finding'] = {'Ref': 'B. A. Nosek, M. Banaji, A. G. Greenwald, Harvesting implicit group attitudes and beliefs from a demonstration web site., Group Dynamics: Theory, Research, and Practice 6, 101 (2002).',
174
                                    'N': '39k',
175
                                    'd': '0.72',
176
                                    'p': '< 1e-2'}
177
178
WEAT_DATA[6]['original_finding'] = {'Ref': 'B. A. Nosek, M. Banaji, A. G. Greenwald, Harvesting implicit group attitudes and beliefs from a demonstration web site., Group Dynamics: Theory, Research, and Practice 6, 101 (2002).',
179
                                    'N': '28k',
180
                                    'd': '0.82',
181
                                    'p': '< 1e-2'}
182
183
WEAT_DATA[7]['original_finding'] = {'Ref': 'B. A. Nosek, M. R. Banaji, A. G. Greenwald, Math=male, me=female, therefore math̸=me., Journal of Personality and Social Psychology 83, 44 (2002).',
184
                                    'N': '91',
185
                                    'd': '1.47',
186
                                    'p': '1e-24'}
187
188
WEAT_DATA[8]['original_finding'] = {'Ref': 'P. D. Turney, P. Pantel, From frequency to meaning: Vector space models of semantics, Journal of Artificial Intelligence Research 37, 141 (2010).',
189
                                    'N': '135',
190
                                    'd': '1.01',
191
                                    'p': '1e-3'}
192
193
194
WEAT_DATA[9]['original_finding'] = {'Ref': 'B. A. Nosek, M. Banaji, A. G. Greenwald, Harvesting implicit group attitudes and beliefs from a demonstration web site., Group Dynamics: Theory, Research, and Practice 6, 101 (2002).',
195
                                    'N': '43k',
196
                                    'd': '1.42',
197
                                    'p': '< 1e-2'}
198
199
json.dump(WEAT_DATA, open('weat.json', 'w'), indent=True)
200