Passed
Push — master ( 2c3fc9...007356 )
by
unknown
02:03
created

test_list_id_in_segment   B

Complexity

Total Complexity 44

Size/Duplication

Total Lines 382
Duplicated Lines 40.05 %

Importance

Changes 0
Metric Value
eloc 215
dl 153
loc 382
rs 8.8798
c 0
b 0
f 0
wmc 44

24 Methods

Rating   Name   Duplication   Size   Complexity  
A TestListIdInSegmentIP.test_list_id_in_segment_with_index_A() 0 8 1
A TestListIdInSegmentBase.get_collection_name() 0 6 1
A TestListIdInSegmentBase.test_list_id_in_segment_collection_name_not_existed() 0 10 2
A TestListIdInSegmentBase.test_list_id_in_segment_without_index_A() 0 12 1
A TestListIdInSegmentBase.test_list_id_in_segment_collection_name_invalid() 0 10 2
A TestListIdInSegmentJAC.test_list_id_in_segment_with_index_A() 0 8 1
A TestListIdInSegmentBase.test_list_id_in_segment_with_index_B() 15 15 2
A TestListIdInSegmentBase.test_list_id_in_segment_without_index_B() 18 18 2
A TestListIdInSegmentJAC.test_list_id_in_segment_with_index_B() 12 12 1
A TestListIdInSegmentBase.test_list_id_in_segment_with_index_A() 0 11 2
A TestListIdInSegmentBase.test_list_id_in_segment_name_not_existed() 0 10 2
A TestListIdInSegmentJAC.get_jaccard_index() 0 10 2
A TestListIdInSegmentBase.test_list_id_in_segment_after_delete_vectors() 15 15 1
A TestListIdInSegmentIP.test_list_id_in_segment_with_index_B() 12 12 1
A TestListIdInSegmentBase.test_list_id_in_segment_collection_name_None() 0 10 2
A TestListIdInSegmentBase.test_list_id_in_segment_name_None() 0 10 2
A TestListIdInSegmentIP.get_simple_index() 0 9 3
A TestListIdInSegmentJAC.test_list_id_in_segment_without_index_A() 16 16 2
A TestListIdInSegmentJAC.test_list_id_in_segment_after_delete_vectors() 15 15 1
A TestListIdInSegmentJAC.test_list_id_in_segment_without_index_B() 17 17 2
A TestListIdInSegmentIP.test_list_id_in_segment_without_index_B() 18 18 2
A TestListIdInSegmentBase.get_simple_index() 0 9 3
A TestListIdInSegmentIP.test_list_id_in_segment_after_delete_vectors() 15 15 1
A TestListIdInSegmentIP.test_list_id_in_segment_without_index_A() 0 16 2

1 Function

Rating   Name   Duplication   Size   Complexity  
A get_segment_id() 0 11 3

How to fix   Duplicated Code    Complexity   

Duplicated Code

Duplicate code is one of the most pungent code smells. A rule that is often used is to re-structure code once it is duplicated in three or more places.

Common duplication problems, and corresponding solutions are:

Complexity

 Tip:   Before tackling complexity, make sure that you eliminate any duplication first. This often can reduce the size of classes significantly.

Complex classes like test_list_id_in_segment often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

1
import time
2
import random
3
import pdb
4
import threading
5
import logging
6
from multiprocessing import Pool, Process
7
import pytest
8
from utils import *
9
10
dim = 128
11
segment_row_count = 100000
12
nb = 6000
13
tag = "1970-01-01"
14
field_name = "float_vector"
15
default_index_name = "list_index"
16
collection_id = "list_id_in_segment"
17
entity = gen_entities(1)
18
raw_vector, binary_entity = gen_binary_entities(1)
19
entities = gen_entities(nb)
20
raw_vectors, binary_entities = gen_binary_entities(nb)
21
default_fields = gen_default_fields() 
22
23
24
def get_segment_id(connect, collection, nb=1, vec_type='float', index_params=None):
25
    if vec_type != "float":
26
        vectors, entities = gen_binary_entities(nb)
27
    else:
28
        entities = gen_entities(nb)
29
    ids = connect.insert(collection, entities)
30
    connect.flush([collection])
31
    if index_params:
32
        connect.create_index(collection, field_name, default_index_name, index_params)
33
    stats = connect.get_collection_stats(collection)
34
    return ids, stats["partitions"][0]["segments"][0]["id"]
35
36
37
class TestListIdInSegmentBase:
38
        
39
    """
40
    ******************************************************************
41
      The following cases are used to test `list_id_in_segment` function
42
    ******************************************************************
43
    """
44
    def test_list_id_in_segment_collection_name_None(self, connect, collection):
45
        '''
46
        target: get vector ids where collection name is None
47
        method: call list_id_in_segment with the collection_name: None
48
        expected: exception raised
49
        '''
50
        collection_name = None
51
        ids, segment_id = get_segment_id(connect, collection)
52
        with pytest.raises(Exception) as e:
53
            connect.list_id_in_segment(collection_name, segment_id)
54
55
    def test_list_id_in_segment_collection_name_not_existed(self, connect, collection):
56
        '''
57
        target: get vector ids where collection name does not exist
58
        method: call list_id_in_segment with a random collection_name, which is not in db
59
        expected: status not ok
60
        '''
61
        collection_name = gen_unique_str(collection_id)
62
        ids, segment_id = get_segment_id(connect, collection)
63
        with pytest.raises(Exception) as e:
64
            vector_ids = connect.list_id_in_segment(collection_name, segment_id)
65
    
66
    @pytest.fixture(
67
        scope="function",
68
        params=gen_invalid_strs()
69
    )
70
    def get_collection_name(self, request):
71
        yield request.param
72
73
    def test_list_id_in_segment_collection_name_invalid(self, connect, collection, get_collection_name):
74
        '''
75
        target: get vector ids where collection name is invalid
76
        method: call list_id_in_segment with invalid collection_name
77
        expected: status not ok
78
        '''
79
        collection_name = get_collection_name
80
        ids, segment_id = get_segment_id(connect, collection)
81
        with pytest.raises(Exception) as e:
82
            connect.list_id_in_segment(collection_name, segment_id)
83
84
    def test_list_id_in_segment_name_None(self, connect, collection):
85
        '''
86
        target: get vector ids where segment name is None
87
        method: call list_id_in_segment with the name: None
88
        expected: exception raised
89
        '''
90
        ids, segment_id = get_segment_id(connect, collection)
91
        segment = None
92
        with pytest.raises(Exception) as e:
93
            vector_ids = connect.list_id_in_segment(collection, segment)
94
95
    def test_list_id_in_segment_name_not_existed(self, connect, collection):
96
        '''
97
        target: get vector ids where segment name does not exist
98
        method: call list_id_in_segment with a random segment name
99
        expected: status not ok
100
        '''
101
        ids, seg_id = get_segment_id(connect, collection)
102
        # segment = gen_unique_str(collection_id)
103
        with pytest.raises(Exception) as e:
104
            vector_ids = connect.list_id_in_segment(collection, seg_id + 10000)
105
106
    def test_list_id_in_segment_without_index_A(self, connect, collection):
107
        '''
108
        target: get vector ids when there is no index
109
        method: call list_id_in_segment and check if the segment contains vectors
110
        expected: status ok
111
        '''
112
        nb = 1
113
        ids, seg_id = get_segment_id(connect, collection, nb=nb)
114
        vector_ids = connect.list_id_in_segment(collection, seg_id)
115
        # vector_ids should match ids
116
        assert len(vector_ids) == nb
117
        assert vector_ids[0] == ids[0]
118
119 View Code Duplication
    def test_list_id_in_segment_without_index_B(self, connect, collection):
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
120
        '''
121
        target: get vector ids when there is no index but with partition
122
        method: create partition, add vectors to it and call list_id_in_segment, check if the segment contains vectors
123
        expected: status ok
124
        '''
125
        nb = 10
126
        entities = gen_entities(nb)
127
        connect.create_partition(collection, tag)
128
        ids = connect.insert(collection, entities, partition_tag=tag)
129
        connect.flush([collection])
130
        stats = connect.get_collection_stats(collection)
131
        assert stats["partitions"][1]["tag"] == tag
132
        vector_ids = connect.list_id_in_segment(collection, stats["partitions"][1]["segments"][0]["id"])
133
        # vector_ids should match ids
134
        assert len(vector_ids) == nb
135
        for i in range(nb):
136
            assert vector_ids[i] == ids[i]
137
138
    @pytest.fixture(
139
        scope="function",
140
        params=gen_simple_index()
141
    )
142
    def get_simple_index(self, request, connect):
143
        if str(connect._cmd("mode")) == "CPU":
144
            if request.param["index_type"] in index_cpu_not_support():
145
                pytest.skip("CPU not support index_type: ivf_sq8h")
146
        return request.param
147
148
    def test_list_id_in_segment_with_index_A(self, connect, collection, get_simple_index):
149
        '''
150
        target: get vector ids when there is index
151
        method: call list_id_in_segment and check if the segment contains vectors
152
        expected: status ok
153
        '''
154
        ids, seg_id = get_segment_id(connect, collection, nb=nb, index_params=get_simple_index)
155
        try:
156
            connect.list_id_in_segment(collection, seg_id)
157
        except Exception as e:
158
            assert False, str(e)
159
        # TODO: 
160
161 View Code Duplication
    def test_list_id_in_segment_with_index_B(self, connect, collection, get_simple_index):
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
162
        '''
163
        target: get vector ids when there is index and with partition
164
        method: create partition, add vectors to it and call list_id_in_segment, check if the segment contains vectors
165
        expected: status ok
166
        '''
167
        connect.create_partition(collection, tag)
168
        ids = connect.insert(collection, entities, partition_tag=tag)
169
        connect.flush([collection])
170
        stats = connect.get_collection_stats(collection)
171
        assert stats["partitions"][1]["tag"] == tag
172
        try:
173
            connect.list_id_in_segment(collection, stats["partitions"][1]["segments"][0]["id"])
174
        except Exception as e:
175
            assert False, str(e)
176
        # vector_ids should match ids
177
        # TODO
178
179 View Code Duplication
    def test_list_id_in_segment_after_delete_vectors(self, connect, collection):
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
180
        '''
181
        target: get vector ids after vectors are deleted
182
        method: add vectors and delete a few, call list_id_in_segment
183
        expected: status ok, vector_ids decreased after vectors deleted
184
        '''
185
        nb = 2
186
        ids, seg_id = get_segment_id(connect, collection, nb=nb)
187
        delete_ids = [ids[0]]
188
        status = connect.delete_entity_by_id(collection, delete_ids)
189
        connect.flush([collection])
190
        stats = connect.get_collection_stats(collection)
191
        vector_ids = connect.list_id_in_segment(collection, stats["partitions"][0]["segments"][0]["id"])
192
        assert len(vector_ids) == 1
193
        assert vector_ids[0] == ids[1]
194
195
196
class TestListIdInSegmentIP:
197
    """
198
    ******************************************************************
199
      The following cases are used to test `list_id_in_segment` function
200
    ******************************************************************
201
    """
202
    def test_list_id_in_segment_without_index_A(self, connect, ip_collection):
203
        '''
204
        target: get vector ids when there is no index
205
        method: call list_id_in_segment and check if the segment contains vectors
206
        expected: status ok
207
        '''
208
        nb = 10
209
        entities = gen_entities(nb)
210
        ids = connect.insert(ip_collection, entities)
211
        connect.flush([ip_collection])
212
        stats = connect.get_collection_stats(ip_collection)
213
        vector_ids = connect.list_id_in_segment(ip_collection, stats["partitions"][0]["segments"][0]["id"])
214
        # vector_ids should match ids
215
        assert len(vector_ids) == nb
216
        for i in range(nb):
217
            assert vector_ids[i] == ids[i]
218
219 View Code Duplication
    def test_list_id_in_segment_without_index_B(self, connect, ip_collection):
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
220
        '''
221
        target: get vector ids when there is no index but with partition
222
        method: create partition, add vectors to it and call list_id_in_segment, check if the segment contains vectors
223
        expected: status ok
224
        '''
225
        connect.create_partition(ip_collection, tag)
226
        nb = 10
227
        entities = gen_entities(nb)
228
        ids = connect.insert(ip_collection, entities, partition_tag=tag)
229
        connect.flush([ip_collection])
230
        stats = connect.get_collection_stats(ip_collection)
231
        assert stats["partitions"][1]["tag"] == tag
232
        vector_ids = connect.list_id_in_segment(ip_collection, stats["partitions"][1]["segments"][0]["id"])
233
        # vector_ids should match ids
234
        assert len(vector_ids) == nb
235
        for i in range(nb):
236
            assert vector_ids[i] == ids[i]
237
238
    @pytest.fixture(
239
        scope="function",
240
        params=gen_simple_index()
241
    )
242
    def get_simple_index(self, request, connect):
243
        if str(connect._cmd("mode")) == "CPU":
244
            if request.param["index_type"] in index_cpu_not_support():
245
                pytest.skip("CPU not support index_type: ivf_sq8h")
246
        return request.param
247
248
    def test_list_id_in_segment_with_index_A(self, connect, ip_collection, get_simple_index):
249
        '''
250
        target: get vector ids when there is index
251
        method: call list_id_in_segment and check if the segment contains vectors
252
        expected: status ok
253
        '''
254
        ids, seg_id = get_segment_id(connect, ip_collection, nb=nb, index_params=get_simple_index)
255
        vector_ids = connect.list_id_in_segment(ip_collection, seg_id)
256
        # TODO: 
257
258 View Code Duplication
    def test_list_id_in_segment_with_index_B(self, connect, ip_collection, get_simple_index):
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
259
        '''
260
        target: get vector ids when there is index and with partition
261
        method: create partition, add vectors to it and call list_id_in_segment, check if the segment contains vectors
262
        expected: status ok
263
        '''
264
        connect.create_partition(ip_collection, tag)
265
        ids = connect.insert(ip_collection, entities, partition_tag=tag)
266
        connect.flush([ip_collection])
267
        stats = connect.get_collection_stats(ip_collection)
268
        assert stats["partitions"][1]["tag"] == tag
269
        vector_ids = connect.list_id_in_segment(ip_collection, stats["partitions"][1]["segments"][0]["id"])
270
        # vector_ids should match ids
271
        # TODO
272
273 View Code Duplication
    def test_list_id_in_segment_after_delete_vectors(self, connect, ip_collection):
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
274
        '''
275
        target: get vector ids after vectors are deleted
276
        method: add vectors and delete a few, call list_id_in_segment
277
        expected: status ok, vector_ids decreased after vectors deleted
278
        '''
279
        nb = 2
280
        ids, seg_id = get_segment_id(connect, ip_collection, nb=nb)
281
        delete_ids = [ids[0]]
282
        status = connect.delete_entity_by_id(ip_collection, delete_ids)
283
        connect.flush([ip_collection])
284
        stats = connect.get_collection_stats(ip_collection)
285
        vector_ids = connect.list_id_in_segment(ip_collection, stats["partitions"][0]["segments"][0]["id"])
286
        assert len(vector_ids) == 1
287
        assert vector_ids[0] == ids[1]
288
289
290
class TestListIdInSegmentJAC:
291
    """
292
    ******************************************************************
293
      The following cases are used to test `list_id_in_segment` function
294
    ******************************************************************
295
    """
296 View Code Duplication
    def test_list_id_in_segment_without_index_A(self, connect, jac_collection):
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
297
        '''
298
        target: get vector ids when there is no index
299
        method: call list_id_in_segment and check if the segment contains vectors
300
        expected: status ok
301
        '''
302
        nb = 10
303
        vectors, entities = gen_binary_entities(nb)
304
        ids = connect.insert(jac_collection, entities)
305
        connect.flush([jac_collection])
306
        stats = connect.get_collection_stats(jac_collection)
307
        vector_ids = connect.list_id_in_segment(jac_collection, stats["partitions"][0]["segments"][0]["id"])
308
        # vector_ids should match ids
309
        assert len(vector_ids) == nb
310
        for i in range(nb):
311
            assert vector_ids[i] == ids[i]
312
313 View Code Duplication
    def test_list_id_in_segment_without_index_B(self, connect, jac_collection):
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
314
        '''
315
        target: get vector ids when there is no index but with partition
316
        method: create partition, add vectors to it and call list_id_in_segment, check if the segment contains vectors
317
        expected: status ok
318
        '''
319
        connect.create_partition(jac_collection, tag)
320
        nb = 10
321
        vectors, entities = gen_binary_entities(nb)
322
        ids = connect.insert(jac_collection, entities, partition_tag=tag)
323
        connect.flush([jac_collection])
324
        stats = connect.get_collection_stats(jac_collection)
325
        vector_ids = connect.list_id_in_segment(jac_collection, stats["partitions"][1]["segments"][0]["id"])
326
        # vector_ids should match ids
327
        assert len(vector_ids) == nb
328
        for i in range(nb):
329
            assert vector_ids[i] == ids[i]
330
331
    @pytest.fixture(
332
        scope="function",
333
        params=gen_simple_index()
334
    )
335
    def get_jaccard_index(self, request, connect):
336
        logging.getLogger().info(request.param)
337
        if request.param["index_type"] in binary_support():
338
            return request.param
339
        else:
340
            pytest.skip("not support")
341
342
    def test_list_id_in_segment_with_index_A(self, connect, jac_collection, get_jaccard_index):
343
        '''
344
        target: get vector ids when there is index
345
        method: call list_id_in_segment and check if the segment contains vectors
346
        expected: status ok
347
        '''
348
        ids, seg_id = get_segment_id(connect, jac_collection, nb=nb, index_params=get_jaccard_index, vec_type='binary')
349
        vector_ids = connect.list_id_in_segment(jac_collection, seg_id)
350
        # TODO: 
351
352 View Code Duplication
    def test_list_id_in_segment_with_index_B(self, connect, jac_collection, get_jaccard_index):
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
353
        '''
354
        target: get vector ids when there is index and with partition
355
        method: create partition, add vectors to it and call list_id_in_segment, check if the segment contains vectors
356
        expected: status ok
357
        '''
358
        connect.create_partition(jac_collection, tag)
359
        ids = connect.insert(jac_collection, entities, partition_tag=tag)
360
        connect.flush([jac_collection])
361
        stats = connect.get_collection_stats(jac_collection)
362
        assert stats["partitions"][1]["tag"] == tag
363
        vector_ids = connect.list_id_in_segment(jac_collection, stats["partitions"][1]["segments"][0]["id"])
364
        # vector_ids should match ids
365
        # TODO
366
367 View Code Duplication
    def test_list_id_in_segment_after_delete_vectors(self, connect, jac_collection, get_jaccard_index):
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
368
        '''
369
        target: get vector ids after vectors are deleted
370
        method: add vectors and delete a few, call list_id_in_segment
371
        expected: status ok, vector_ids decreased after vectors deleted
372
        '''
373
        nb = 2
374
        ids, seg_id = get_segment_id(connect, jac_collection, nb=nb, vec_type='binary', index_params=get_jaccard_index)
375
        delete_ids = [ids[0]]
376
        status = connect.delete_entity_by_id(jac_collection, delete_ids)
377
        connect.flush([jac_collection])
378
        stats = connect.get_collection_stats(jac_collection)
379
        vector_ids = connect.list_id_in_segment(jac_collection, stats["partitions"][0]["segments"][0]["id"])
380
        assert len(vector_ids) == 1
381
        assert vector_ids[0] == ids[1]
382