Completed
Push — 0.8.dev ( a76884...645b74 )
by Andrei
01:06
created

bang_visualizer   A

Complexity

Total Complexity 5

Size/Duplication

Total Lines 76
Duplicated Lines 0 %

Importance

Changes 0
Metric Value
dl 0
loc 76
rs 10
c 0
b 0
f 0
wmc 5

3 Methods

Rating   Name   Duplication   Size   Complexity  
A show_blocks() 0 17 1
A __draw_blocks() 0 16 2
B __draw_block() 0 28 2
1
"""!
2
3
@brief Cluster analysis algorithm: BANG.
4
@details Implementation based on paper @cite inproceedings::bang::1.
5
6
@authors Andrei Novikov ([email protected])
7
@date 2014-2018
8
@copyright GNU Public License
9
10
@cond GNU_PUBLIC_LICENSE
11
    PyClustering is free software: you can redistribute it and/or modify
12
    it under the terms of the GNU General Public License as published by
13
    the Free Software Foundation, either version 3 of the License, or
14
    (at your option) any later version.
15
16
    PyClustering is distributed in the hope that it will be useful,
17
    but WITHOUT ANY WARRANTY; without even the implied warranty of
18
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
19
    GNU General Public License for more details.
20
21
    You should have received a copy of the GNU General Public License
22
    along with this program.  If not, see <http://www.gnu.org/licenses/>.
23
@endcond
24
25
"""
26
27
28
import matplotlib
0 ignored issues
show
Configuration introduced by
The import matplotlib could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
29
import matplotlib.pyplot as plt
0 ignored issues
show
Configuration introduced by
The import matplotlib.pyplot could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
30
import matplotlib.patches as patches
0 ignored issues
show
Configuration introduced by
The import matplotlib.patches could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
31
32
from pyclustering.utils import data_corners
33
34
from pyclustering.cluster import cluster_visualizer
35
36
37
class bang_visualizer:
38
    """!
39
    @brief Visualizer of BANG algorithm's results.
40
    @details BANG visualizer provides visualization services that are specific for BANG algorithm.
41
42
    """
43
44
45
    __maximum_density_alpha = 0.6
46
47
48
    @staticmethod
49
    def show_blocks(data, directory):
50
        """!
51
        @brief Show BANG-blocks (leafs only) in data space.
52
        @details BANG-blocks represents grid that was used for clustering process.
53
54
        @param[in] data (list): Input data space that contains points where each point is also represented by list.
55
        @param[in] directory (bang_directory): Directory that was created by BANG algorithm during clustering process.
56
57
        """
58
        visualizer = cluster_visualizer()
59
        visualizer.append_cluster(data)
60
61
        figure = visualizer.show(display=False)
62
63
        bang_visualizer.__draw_blocks(figure, 0, directory.get_leafs())
64
        plt.show()
65
66
67
    @staticmethod
68
    def __draw_blocks(figure, ax_index, blocks):
69
        """!
70
        @brief Display BANG-blocks on specified figure.
71
72
        @param[in] figure (figure): Figure that is used for drawing blocks.
73
        @param[in] ax_index (uint): Index of axis where blocks should be displayed.
74
        @param[in] blocks (list): List of blocks that should be displyed.
75
76
        """
77
        ax = figure.get_axes()[ax_index]
78
        ax.grid(False)
79
80
        density_scale = blocks[-1].get_density()
81
        for block in blocks:
82
            bang_visualizer.__draw_block(ax, block, density_scale)
83
84
85
    @staticmethod
86
    def __draw_block(ax, block, density_scale):
87
        """!
88
        @brief Display BANG-block on the specified ax.
89
90
        @param[in] ax (Axis): Axis where block should be displayed.
91
        @param[in] block (bang_block): BANG-block that should be displayed.
92
        @param[in] density_scale (double): Max density to display density of the block by appropriate tone.
93
94
        """
95
        max_corner, min_corner = block.get_spatial_block().get_corners()
96
        belong_cluster = block.get_cluster() is not None
97
98
        density_scale = bang_visualizer.__maximum_density_alpha * block.get_density() / density_scale
99
100
        face_color = matplotlib.colors.to_rgba('blue', alpha=density_scale)
101
        edge_color = matplotlib.colors.to_rgba('black', alpha=1.0)
102
103
        if len(max_corner) == 2:
104
            rect = patches.Rectangle(min_corner, max_corner[0] - min_corner[0], max_corner[1] - min_corner[1],
105
                                     fill=belong_cluster,
106
                                     facecolor=face_color,
107
                                     edgecolor=edge_color,
108
                                     linewidth=0.5)
109
            ax.add_patch(rect)
110
111
        else:
112
            raise ValueError("Impossible to display blocks on non-2D dimensional data.")
113
114
115
116
class bang_directory:
117
    """!
118
    @brief BANG directory stores BANG-blocks that represents grid in data space.
119
    @details The directory build BANG-blocks in binary tree manner. Leafs of the tree stored separately to provide
120
              a direct access to the leafs that should be analysed. Leafs cache data-points.
121
122
    """
123
    def __init__(self, data, levels, density_threshold=0.0):
124
        """!
125
        @brief Create BANG directory - basically tree structure with direct access to leafs.
126
127
        @param[in] levels (uint): Height of the blocks tree.
128
        @param[in] density_threshold (double): The lowest level of density when contained data by bang-block is
129
                    considered as a noise and there is no need to split it till the last level.
130
131
        """
132
        self.__data = data
133
        self.__levels = levels
134
        self.__density_threshold = density_threshold
135
        self.__leafs = []
136
        self.__root = None
137
138
        self.__create_directory()
139
140
141
    def get_leafs(self):
142
        """!
143
        @brief Return leafs - the smallest blocks.
144
        @details Some leafs can be bigger than others because splitting is not performed for blocks whose density is
145
                  less than threshold.
146
147
        @return (list) List of blocks that are leafs of BANG directory.
148
149
        """
150
        return self.__leafs
151
152
153
    def __create_directory(self):
154
        """!
155
        @brief Create BANG directory as a tree with separate storage for leafs.
156
157
        """
158
159
        min_corner, max_corner = data_corners(self.__data)
160
        data_block = spatial_block(max_corner, min_corner)
161
162
        cache_require = (self.__levels == 1)
163
        self.__root = bang_block(self.__data, 0, 0, data_block, cache_require)
164
165
        if cache_require:
166
            self.__leafs.append(self.__root)
167
        else:
168
            self.__build_directory_levels()
169
170
171
    def __build_directory_levels(self):
172
        """!
173
        @brief Build levels of direction if amount of level is greater than one.
174
175
        """
176
        previous_level_blocks = [ self.__root ]
177
        for level in range(1, self.__levels):
178
            previous_level_blocks = self.__build_level(previous_level_blocks, level)
179
180
        self.__leafs = sorted(self.__leafs, key=lambda block: block.get_density())
181
182
183
    def __build_level(self, previous_level_blocks, level):
184
        """!
185
        @brief Build new level of directory.
186
187
        @param[in] previous_level_blocks (list): BANG-blocks on the previous level.
188
        @param[in] level (uint): Level number that should be built.
189
190
        @return (list) New block on the specified level.
191
192
        """
193
        current_level_blocks = []
194
195
        split_dimension = level % len(self.__data[0])
196
        cache_require = (level == self.__levels - 1)
197
198
        for block in previous_level_blocks:
199
            self.__split_block(block, split_dimension, cache_require, current_level_blocks)
200
201
        if cache_require:
202
            self.__leafs += current_level_blocks
203
204
        return current_level_blocks
205
206
207
    def __split_block(self, block, split_dimension, cache_require, current_level_blocks):
208
        """!
209
        @brief Split specific block in specified dimension.
210
        @details Split is not performed for block whose density is lower than threshold value, such blocks are putted to
211
                  leafs.
212
213
        @param[in] block (bang_block): BANG-block that should be split.
214
        @param[in] split_dimension (uint): Dimension at which splitting should be performed.
215
        @param[in] cache_require (bool): Defines when points in cache should be stored during density calculation.
216
        @param[in|out] current_level_blocks (list): Block storage at the current level where new blocks should be added.
217
218
        """
219
        if block.get_density() <= self.__density_threshold:
220
            self.__leafs.append(block)
221
222
        else:
223
            left, right = block.split(split_dimension, cache_require)
224
            current_level_blocks.append(left)
225
            current_level_blocks.append(right)
226
227
228
229
class spatial_block:
230
    """!
231
    @brief Geometrical description of BANG block in data space.
232
    @details Provides services related to spatial function and used by bang_block
233
234
    @see bang_block
235
236
    """
237
238
    def __init__(self, max_corner, min_corner):
239
        """!
240
        @brief Creates spatial block in data space.
241
242
        @param[in] max_corner (array_like): Maximum corner coordinates of the block.
243
        @param[in] min_corner (array_like): Minimal corner coordinates of the block.
244
245
        """
246
        self.__max_corner = max_corner
247
        self.__min_corner = min_corner
248
        self.__volume = self.__calculate_volume()
249
250
251
    def __str__(self):
252
        """!
253
        @brief Returns string block description.
254
255
        @return String representation of the block.
256
257
        """
258
        return "(max: %s; min: %s)" % (self.__max_corner, self.__min_corner)
259
260
261
    def __contains__(self, point):
262
        """!
263
        @brief Point is considered as contained if it lies in block (belong to it).
264
265
        @return (bool) True if point is in block, otherwise False.
266
267
        """
268
        for i in range(len(point)):
269
            if point[i] < self.__min_corner[i] or point[i] > self.__max_corner[i]:
270
                return False
271
272
        return True
273
274
275
    def get_corners(self):
276
        """!
277
        @brief Return spatial description of current block.
278
279
        @return (tuple) Pair of maximum and minimum corners (max_corner, min_corner).
280
281
        """
282
        return self.__max_corner, self.__min_corner
283
284
285
    def get_volume(self):
286
        """!
287
        @brief Returns volume of current block.
288
        @details Volume block has uncommon mining here: for 1D is length of a line, for 2D is square of rectangle,
289
                  for 3D is volume of 3D figure, and for ND is volume of ND figure.
290
291
        @return (double) Volume of current block.
292
293
        """
294
        return self.__volume
295
296
297
    def split(self, dimension):
298
        """!
299
        @brief Split current block into two spatial blocks in specified dimension.
300
301
        @param[in] dimension (uint): Dimension where current block should be split.
302
303
        @return (tuple) Pair of new split blocks from current block.
304
305
        """
306
        first_max_corner = self.__max_corner[:]
307
        second_min_corner = self.__min_corner[:]
308
309
        split_border = (self.__max_corner[dimension] + self.__min_corner[dimension]) / 2.0
310
311
        first_max_corner[dimension] = split_border
312
        second_min_corner[dimension] = split_border
313
314
        return spatial_block(first_max_corner, self.__min_corner), spatial_block(self.__max_corner, second_min_corner)
315
316
317
    def is_neighbor(self, block):
318
        """!
319
        @brief Performs calculation to identify whether specified block is neighbor of current block.
320
321
        @param[in] block (spatial_block): Another block that is check whether it is neighbor.
322
323
        @return (bool) True is blocks are neighbors, False otherwise.
324
325
        """
326
        if block is not self:
327
            block_max_corner, _ = block.get_corners()
328
            dimension = len(block_max_corner)
329
            neighborhood_score = self.__calculate_neighborhood(block_max_corner)
330
331
            if neighborhood_score == dimension:
332
                return True
333
334
        return False
335
336
337
    def __calculate_neighborhood(self, block_max_corner):
338
        """!
339
        @brief Calculates neighborhood score that defined whether blocks are neighbors.
340
341
        @param[in] block_max_corner (list): Maximum coordinates of other block.
342
343
        @return (uint) Neighborhood score.
344
345
        """
346
        dimension = len(block_max_corner)
347
348
        length_edges = [self.__max_corner[i] - self.__min_corner[i] for i in range(dimension)]
349
350
        neighborhood_score = 0
351
        for i in range(dimension):
352
            diff = abs(block_max_corner[i] - self.__max_corner[i])
353
354
            if diff <= length_edges[i] + length_edges[i] * 0.0001:
355
                neighborhood_score += 1
356
357
        return neighborhood_score
358
359
360
    def __calculate_volume(self):
361
        """!
362
        @brief Calculates volume of current spatial block.
363
364
        @return (double) Volume of current spatial block.
365
366
        """
367
        volume = self.__max_corner[0] - self.__min_corner[0]
368
        for i in range(1, len(self.__max_corner)):
369
            volume *= self.__max_corner[i] - self.__min_corner[i]
370
371
        return volume
372
373
374
375
class bang_block:
376
    """!
377
    @brief BANG-block that represent spatial region in data space.
378
379
    """
380
    def __init__(self, data, region, level, space_block, cache_points=False):
381
        self.__data = data
382
        self.__region_number = region
383
        self.__level = level
384
        self.__spatial_block = space_block
385
        self.__cache_points = cache_points
386
387
        self.__cluster = None
388
        self.__points = None
389
        self.__density = self.__calculate_density()
390
391
392
    def __str__(self):
393
        return "(" + str(self.__region_number) + ", " + str(self.__level) + ")"
394
395
396
    def get_region(self):
397
        """!
398
        @brief Returns region number of BANG-block.
399
        @details Region number is unique on among region numbers on a directory level. Pair of region number and level
400
                  is unique for all directory.
401
402
        @return (uint) Region number.
403
404
        """
405
        return self.__region_number
406
407
408
    def get_density(self):
409
        """!
410
        @brief Returns density of the BANG-block.
411
412
        @return (double) BANG-block density.
413
414
        """
415
        return self.__density
416
417
418
    def get_cluster(self):
419
        """!
420
        @brief Return index of cluster to which the BANG-block belongs to.
421
        @details Index of cluster may have None value if the block was not assigned to any cluster.
422
423
        @return (uint) Index of cluster or None if the block does not belong to any cluster.
424
425
        """
426
        return self.__cluster
427
428
429
    def get_spatial_block(self):
430
        """!
431
        @brief Return spatial block - BANG-block description in data space.
432
433
        @return (spatial_block) Spatial block of the BANG-block.
434
435
        """
436
        return self.__spatial_block
437
438
439
    def get_points(self):
440
        """!
441
        @brief Return points that covers by the BANG-block.
442
        @details Returns None if block is not leaf.
443
444
        @return (list) List of point indexes that are covered by the block.
445
446
        """
447
        return self.__points
448
449
450
    def set_cluster(self, index):
451
        """!
452
        @brief Assign cluster to the BANG-block by index.
453
454
        @param[in] index (uint): Index cluster that is assigned to BANG-block.
455
456
        """
457
        self.__cluster = index
458
459
460
    def is_neighbor(self, block):
461
        """!
462
        @brief Performs calculation to check whether specified block is neighbor to the current.
463
464
        @param[in] block (bang_block): Other BANG-block that should be checked for neighborhood.
465
466
        @return (bool) True if blocks are neighbors, False if blocks are not neighbors.
467
468
        """
469
        return self.get_spatial_block().is_neighbor(block.get_spatial_block())
470
471
472
    def split(self, split_dimension, cache_points):
473
        """!
474
        @brief Split BANG-block into two new blocks in specified dimension.
475
476
        @param[in] split_dimension (uint): Dimension where block should be split.
477
        @param[in] cache_points (bool): If True then covered points are cached. Used for leaf blocks.
478
479
        @return (tuple) Pair of BANG-block that were formed from the current.
480
481
        """
482
        left_region_number = self.__region_number
483
        right_region_number = self.__region_number + 2 ** self.__level
484
485
        first_spatial_block, second_spatial_block = self.__spatial_block.split(split_dimension)
486
487
        left = bang_block(self.__data, left_region_number, self.__level + 1, first_spatial_block, cache_points)
488
        right = bang_block(self.__data, right_region_number, self.__level + 1, second_spatial_block, cache_points)
489
490
        return left, right
491
492
493
    def __calculate_density(self):
494
        """!
495
        @brief Calculates BANG-block density.
496
497
        @return (double) BANG-block density.
498
499
        """
500
        return self.__get_amount_points() / self.__spatial_block.get_volume()
501
502
503
    def __get_amount_points(self):
504
        """!
505
        @brief Count covered points by the BANG-block and if cache is enable then covered points are stored.
506
507
        @return (uint) Amount of covered points.
508
509
        """
510
        amount = 0
511
        for index in range(len(self.__data)):
512
            if self.__data[index] in self.__spatial_block:
513
                self.__cache_point(index)
514
                amount += 1
515
516
        return amount
517
518
519
    def __cache_point(self, index):
520
        """!
521
        @brief Store index points.
522
523
        @param[in] index (uint): Index point that should be stored.
524
525
        """
526
        if self.__cache_points:
527
            if self.__points is None:
528
                self.__points = []
529
530
            self.__points.append(index)
531
532
533
534
class bang:
535
    def __init__(self, data, levels, density_threshold = 0.0):
536
        self.__data = data
537
        self.__levels = levels
538
        self.__directory = None
539
        self.__clusters = []
540
        self.__noise = []
541
        self.__cluster_blocks = []
542
        self.__density_threshold = density_threshold
543
544
545
    def process(self):
546
        self.__validate_arguments()
547
548
        self.__directory = bang_directory(self.__data, self.__levels, self.__density_threshold)
549
        self.__allocate_clusters()
550
551
552
    def get_clusters(self):
553
        return self.__clusters
554
555
556
    def get_noise(self):
557
        return self.__noise
558
559
560
    def get_directory(self):
561
        return self.__directory
562
563
564
    def __validate_arguments(self):
565
        if self.__levels <= 0:
566
            raise ValueError("Incorrect amount of levels '%d'. Level value should be greater than 0." % self.__levels)
567
568
        if len(self.__data) == 0:
569
            raise ValueError("Empty input data. Data should contain at least one point.")
570
571
        if self.__density_threshold < 0:
572
            raise ValueError("Incorrect density threshold '%f'. Density threshold should not be negative." % self.__density_threshold)
573
574
575
    def __allocate_clusters(self):
576
        leaf_blocks = self.__directory.get_leafs()
577
        unhandled_block_indexes = set([i for i in range(len(leaf_blocks)) if leaf_blocks[i].get_density() > self.__density_threshold])
578
        appropriate_block_indexes = set(unhandled_block_indexes)
579
580
        current_block = self.__find_block_center(leaf_blocks)
581
        cluster_index = 0
582
583
        while current_block is not None:
584
            if current_block.get_density() <= self.__density_threshold:
585
                break
586
587
            self.__expand_cluster_block(current_block, cluster_index, leaf_blocks, unhandled_block_indexes)
588
589
            current_block = self.__find_block_center(leaf_blocks)
590
            cluster_index += 1
591
592
        self.__store_clustering_results(cluster_index, appropriate_block_indexes, leaf_blocks)
593
594
595
    def __expand_cluster_block(self, block, cluster_index, leaf_blocks, unhandled_block_indexes):
596
        block.set_cluster(cluster_index)
597
598
        neighbors = self.__find_block_neighbors(block, leaf_blocks, unhandled_block_indexes)
599
        for neighbor in neighbors:
600
            neighbor.set_cluster(cluster_index)
601
            neighbors += self.__find_block_neighbors(neighbor, leaf_blocks, unhandled_block_indexes)
602
603
604
    def __store_clustering_results(self, amount_clusters, appropriate_block_indexes, leaf_blocks):
605
        self.__clusters = [[] for _ in range(amount_clusters)]
606
        for appropriate_index in appropriate_block_indexes:
607
            block = leaf_blocks[appropriate_index]
608
            index = block.get_cluster()
609
610
            if index is not None:
611
                self.__clusters[index] += block.get_points()
612
            else:
613
                self.__noise += block.get_points()
614
615
        self.__clusters = [ list(set(cluster)) for cluster in self.__clusters ]
616
        self.__noise = list(set(self.__noise))
617
618
619
    def __find_block_center(self, level_blocks):
620
        for i in reversed(range(len(level_blocks))):
621
            if level_blocks[i].get_cluster() is None:
622
                return level_blocks[i]
623
624
        return None
625
626
627
    def __find_block_neighbors(self, block, level_blocks, unhandled_block_indexes):
628
        neighbors = []
629
630
        handled_block_indexes = []
631
        for unhandled_index in unhandled_block_indexes:
632
            if block.is_neighbor(level_blocks[unhandled_index]):
633
                handled_block_indexes.append(unhandled_index)
634
                neighbors.append(level_blocks[unhandled_index])
635
636
                # Maximum number of neighbors is eight
637
                if len(neighbors) == 8:
638
                    break
639
640
        for handled_index in handled_block_indexes:
641
            unhandled_block_indexes.remove(handled_index)
642
643
        return neighbors
644