Completed
Push — 0.8.dev ( 85e2ef...825d19 )
by Andrei
01:00
created

bang_visualizer.show_clusters()   A

Complexity

Conditions 1

Size

Total Lines 14

Duplication

Lines 0
Ratio 0 %

Importance

Changes 0
Metric Value
cc 1
dl 0
loc 14
rs 9.4285
c 0
b 0
f 0
1
"""!
2
3
@brief Cluster analysis algorithm: BANG.
4
@details Implementation based on paper @cite inproceedings::bang::1.
5
6
@authors Andrei Novikov ([email protected])
7
@date 2014-2018
8
@copyright GNU Public License
9
10
@cond GNU_PUBLIC_LICENSE
11
    PyClustering is free software: you can redistribute it and/or modify
12
    it under the terms of the GNU General Public License as published by
13
    the Free Software Foundation, either version 3 of the License, or
14
    (at your option) any later version.
15
16
    PyClustering is distributed in the hope that it will be useful,
17
    but WITHOUT ANY WARRANTY; without even the implied warranty of
18
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
19
    GNU General Public License for more details.
20
21
    You should have received a copy of the GNU General Public License
22
    along with this program.  If not, see <http://www.gnu.org/licenses/>.
23
@endcond
24
25
"""
26
27
28
import matplotlib
0 ignored issues
show
Configuration introduced by
The import matplotlib could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
29
import matplotlib.gridspec as gridspec;
0 ignored issues
show
Configuration introduced by
The import matplotlib.gridspec could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
30
import matplotlib.pyplot as plt
0 ignored issues
show
Configuration introduced by
The import matplotlib.pyplot could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
31
import matplotlib.patches as patches
0 ignored issues
show
Configuration introduced by
The import matplotlib.patches could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
32
33
import itertools
34
35
from pyclustering.cluster import cluster_visualizer
36
from pyclustering.cluster.encoder import type_encoding
37
38
from pyclustering.utils import data_corners
39
from pyclustering.utils.color import color as color_list
40
41
42
43
class bang_visualizer:
44
    """!
45
    @brief Visualizer of BANG algorithm's results.
46
    @details BANG visualizer provides visualization services that are specific for BANG algorithm.
47
48
    """
49
50
51
    __maximum_density_alpha = 0.6
52
53
54
    @staticmethod
55
    def show_blocks(directory):
56
        """!
57
        @brief Show BANG-blocks (leafs only) in data space.
58
        @details BANG-blocks represents grid that was used for clustering process.
59
60
        @param[in] directory (bang_directory): Directory that was created by BANG algorithm during clustering process.
61
62
        """
63
64
        dimension = len(directory.get_data()[0])
65
66
        amount_canvases = 1
67
        if dimension > 1:
68
            amount_canvases = int(dimension * (dimension - 1) / 2)
69
70
        figure = plt.figure();
71
        grid_spec = gridspec.GridSpec(1, amount_canvases);
72
73
        pairs = list(itertools.combinations(range(dimension), 2))
74
        if len(pairs) == 0: pairs = [(0, 0)]
75
76
        for index in range(amount_canvases):
77
            ax = figure.add_subplot(grid_spec[index]);
78
            bang_visualizer.__draw_blocks(ax, directory.get_leafs(), pairs[index])
79
            bang_visualizer.__draw_two_dimension_data(ax, directory.get_data(), pairs[index])
80
81
        plt.show()
82
83
84
    @staticmethod
85
    def show_dendrogram(dendrogram):
86
        """!
87
        @brief Display dendrogram of BANG-blocks.
88
89
        @param[in] dendrogram (list): List representation of dendrogram of BANG-blocks.
90
91
        @see bang.get_dendrogram()
92
93
        """
94
        plt.figure()
95
        axis = plt.subplot(1, 1, 1)
96
97
        current_position = 0
98
        for index_cluster in range(len(dendrogram)):
99
            densities = [ block.get_density() for block in dendrogram[index_cluster] ]
100
            xrange = range(current_position, current_position + len(densities))
101
102
            axis.bar(xrange, densities, 1.0, linewidth=0.0, color=color_list.get_color(index_cluster))
103
104
            current_position += len(densities)
105
106
        axis.set_ylabel("density")
107
        axis.set_xlabel("block")
108
        axis.xaxis.set_ticklabels([]);
109
110
        plt.xlim([-0.5, current_position - 0.5])
111
        plt.show()
112
113
114
    @staticmethod
115
    def show_clusters(data, clusters, noise=[]):
0 ignored issues
show
Bug Best Practice introduced by
The default value [] might cause unintended side-effects.

Objects as default values are only created once in Python and not on each invocation of the function. If the default object is modified, this modification is carried over to the next invocation of the method.

# Bad:
# If array_param is modified inside the function, the next invocation will
# receive the modified object.
def some_function(array_param=[]):
    # ...

# Better: Create an array on each invocation
def some_function(array_param=None):
    array_param = array_param or []
    # ...
Loading history...
116
        """!
117
        @brief Display K-Means clustering results.
118
119
        @param[in] sample (list): Dataset that was used for clustering.
120
        @param[in] clusters (array_like): Clusters that were allocated by the algorithm.
121
        @param[in] noise (array_like): Noise that were allocated by the algorithm.
122
123
        """
124
        visualizer = cluster_visualizer()
125
        visualizer.append_clusters(clusters, data)
126
        visualizer.append_cluster(noise, data, marker='x')
127
        visualizer.show()
128
129
130
    @staticmethod
131
    def __draw_two_dimension_data(ax, data, pair):
132
        """!
133
        @brief Display data in two-dimensional canvas.
134
135
        @param[in] ax (Axis): Canvas where data should be displayed.
136
        @param[in] data (list): Data points that should be displayed.
137
        @param[in] pair (tuple): Pair of dimension indexes.
138
139
        """
140
        ax.set_xlabel("x%d" % pair[0])
141
        ax.set_ylabel("x%d" % pair[1])
142
143
        for point in data:
144
            if len(data[0]) > 1:
145
                ax.plot(point[pair[0]], point[pair[1]], color='red', marker='.')
146
            else:
147
                ax.plot(point[pair[0]], 0, color='red', marker='.')
148
                ax.yaxis.set_ticklabels([]);
149
150
151
    @staticmethod
152
    def __draw_blocks(ax, blocks, pair):
153
        """!
154
        @brief Display BANG-blocks on specified figure.
155
156
        @param[in] ax (Axis): Axis where bang-blocks should be displayed.
157
        @param[in] blocks (list): List of blocks that should be displyed.
158
        @param[in] pair (tuple): Pair of coordinate index that should be displyed.
159
160
        """
161
        ax.grid(False)
162
163
        density_scale = blocks[-1].get_density()
164
        for block in blocks:
165
            bang_visualizer.__draw_block(ax, pair, block, density_scale)
166
167
168
    @staticmethod
169
    def __draw_block(ax, pair, block, density_scale):
170
        """!
171
        @brief Display BANG-block on the specified ax.
172
173
        @param[in] ax (Axis): Axis where block should be displayed.
174
        @param[in] pair (tuple): Pair of coordinate index that should be displayed.
175
        @param[in] block (bang_block): BANG-block that should be displayed.
176
        @param[in] density_scale (double): Max density to display density of the block by appropriate tone.
177
178
        """
179
        max_corner, min_corner = bang_visualizer.__get_rectangle_description(block, pair)
180
181
        belong_cluster = block.get_cluster() is not None
182
183
        density_scale = bang_visualizer.__maximum_density_alpha * block.get_density() / density_scale
184
185
        face_color = matplotlib.colors.to_rgba('blue', alpha=density_scale)
186
        edge_color = matplotlib.colors.to_rgba('black', alpha=1.0)
187
188
        rect = patches.Rectangle(min_corner, max_corner[0] - min_corner[0], max_corner[1] - min_corner[1],
189
                                 fill=belong_cluster,
190
                                 facecolor=face_color,
191
                                 edgecolor=edge_color,
192
                                 linewidth=0.5)
193
        ax.add_patch(rect)
194
195
196
    @staticmethod
197
    def __get_rectangle_description(block, pair):
198
        """!
199
        @brief Create rectangle description for block in specific dimension.
200
201
        @param[in] pair (tuple): Pair of coordinate index that should be displayed.
202
        @param[in] block (bang_block): BANG-block that should be displayed
203
204
        @return (tuple) Pair of corners that describes rectangle.
205
206
        """
207
        max_corner, min_corner = block.get_spatial_block().get_corners()
208
209
        max_corner = [max_corner[pair[0]], max_corner[pair[1]]]
210
        min_corner = [min_corner[pair[0]], min_corner[pair[1]]]
211
212
        if pair == (0, 0):
213
            max_corner[1], min_corner[1] = 1.0, -1.0
214
215
        return max_corner, min_corner
216
217
218
class bang_directory:
219
    """!
220
    @brief BANG directory stores BANG-blocks that represents grid in data space.
221
    @details The directory build BANG-blocks in binary tree manner. Leafs of the tree stored separately to provide
222
              a direct access to the leafs that should be analysed. Leafs cache data-points.
223
224
    """
225
    def __init__(self, data, levels, density_threshold=0.0):
226
        """!
227
        @brief Create BANG directory - basically tree structure with direct access to leafs.
228
229
        @param[in] levels (uint): Height of the blocks tree.
230
        @param[in] density_threshold (double): The lowest level of density when contained data by bang-block is
231
                    considered as a noise and there is no need to split it till the last level.
232
233
        """
234
        self.__data = data
235
        self.__levels = levels
236
        self.__density_threshold = density_threshold
237
        self.__leafs = []
238
        self.__root = None
239
240
        self.__create_directory()
241
242
243
    def get_data(self):
244
        """!
245
        @brief Return data that is stored in the directory.
246
247
        @return (list) List of points that represents stored data.
248
249
        """
250
        return self.__data
251
252
253
    def get_leafs(self):
254
        """!
255
        @brief Return leafs - the smallest blocks.
256
        @details Some leafs can be bigger than others because splitting is not performed for blocks whose density is
257
                  less than threshold.
258
259
        @return (list) List of blocks that are leafs of BANG directory.
260
261
        """
262
        return self.__leafs
263
264
265
    def __create_directory(self):
266
        """!
267
        @brief Create BANG directory as a tree with separate storage for leafs.
268
269
        """
270
271
        min_corner, max_corner = data_corners(self.__data)
272
        data_block = spatial_block(max_corner, min_corner)
273
274
        cache_require = (self.__levels == 1)
275
        self.__root = bang_block(self.__data, 0, 0, data_block, cache_require)
276
277
        if cache_require:
278
            self.__leafs.append(self.__root)
279
        else:
280
            self.__build_directory_levels()
281
282
283
    def __build_directory_levels(self):
284
        """!
285
        @brief Build levels of direction if amount of level is greater than one.
286
287
        """
288
        previous_level_blocks = [ self.__root ]
289
        for level in range(1, self.__levels):
290
            previous_level_blocks = self.__build_level(previous_level_blocks, level)
291
292
        self.__leafs = sorted(self.__leafs, key=lambda block: block.get_density())
293
294
295
    def __build_level(self, previous_level_blocks, level):
296
        """!
297
        @brief Build new level of directory.
298
299
        @param[in] previous_level_blocks (list): BANG-blocks on the previous level.
300
        @param[in] level (uint): Level number that should be built.
301
302
        @return (list) New block on the specified level.
303
304
        """
305
        current_level_blocks = []
306
307
        split_dimension = level % len(self.__data[0])
308
        cache_require = (level == self.__levels - 1)
309
310
        for block in previous_level_blocks:
311
            self.__split_block(block, split_dimension, cache_require, current_level_blocks)
312
313
        if cache_require:
314
            self.__leafs += current_level_blocks
315
316
        return current_level_blocks
317
318
319
    def __split_block(self, block, split_dimension, cache_require, current_level_blocks):
320
        """!
321
        @brief Split specific block in specified dimension.
322
        @details Split is not performed for block whose density is lower than threshold value, such blocks are putted to
323
                  leafs.
324
325
        @param[in] block (bang_block): BANG-block that should be split.
326
        @param[in] split_dimension (uint): Dimension at which splitting should be performed.
327
        @param[in] cache_require (bool): Defines when points in cache should be stored during density calculation.
328
        @param[in|out] current_level_blocks (list): Block storage at the current level where new blocks should be added.
329
330
        """
331
        if block.get_density() <= self.__density_threshold:
332
            self.__leafs.append(block)
333
334
        else:
335
            left, right = block.split(split_dimension, cache_require)
336
            current_level_blocks.append(left)
337
            current_level_blocks.append(right)
338
339
340
341
class spatial_block:
342
    """!
343
    @brief Geometrical description of BANG block in data space.
344
    @details Provides services related to spatial function and used by bang_block
345
346
    @see bang_block
347
348
    """
349
350
    def __init__(self, max_corner, min_corner):
351
        """!
352
        @brief Creates spatial block in data space.
353
354
        @param[in] max_corner (array_like): Maximum corner coordinates of the block.
355
        @param[in] min_corner (array_like): Minimal corner coordinates of the block.
356
357
        """
358
        self.__max_corner = max_corner
359
        self.__min_corner = min_corner
360
        self.__volume = self.__calculate_volume()
361
362
363
    def __str__(self):
364
        """!
365
        @brief Returns string block description.
366
367
        @return String representation of the block.
368
369
        """
370
        return "(max: %s; min: %s)" % (self.__max_corner, self.__min_corner)
371
372
373
    def __contains__(self, point):
374
        """!
375
        @brief Point is considered as contained if it lies in block (belong to it).
376
377
        @return (bool) True if point is in block, otherwise False.
378
379
        """
380
        for i in range(len(point)):
381
            if point[i] < self.__min_corner[i] or point[i] > self.__max_corner[i]:
382
                return False
383
384
        return True
385
386
387
    def get_corners(self):
388
        """!
389
        @brief Return spatial description of current block.
390
391
        @return (tuple) Pair of maximum and minimum corners (max_corner, min_corner).
392
393
        """
394
        return self.__max_corner, self.__min_corner
395
396
397
    def get_volume(self):
398
        """!
399
        @brief Returns volume of current block.
400
        @details Volume block has uncommon mining here: for 1D is length of a line, for 2D is square of rectangle,
401
                  for 3D is volume of 3D figure, and for ND is volume of ND figure.
402
403
        @return (double) Volume of current block.
404
405
        """
406
        return self.__volume
407
408
409
    def split(self, dimension):
410
        """!
411
        @brief Split current block into two spatial blocks in specified dimension.
412
413
        @param[in] dimension (uint): Dimension where current block should be split.
414
415
        @return (tuple) Pair of new split blocks from current block.
416
417
        """
418
        first_max_corner = self.__max_corner[:]
419
        second_min_corner = self.__min_corner[:]
420
421
        split_border = (self.__max_corner[dimension] + self.__min_corner[dimension]) / 2.0
422
423
        first_max_corner[dimension] = split_border
424
        second_min_corner[dimension] = split_border
425
426
        return spatial_block(first_max_corner, self.__min_corner), spatial_block(self.__max_corner, second_min_corner)
427
428
429
    def is_neighbor(self, block):
430
        """!
431
        @brief Performs calculation to identify whether specified block is neighbor of current block.
432
433
        @param[in] block (spatial_block): Another block that is check whether it is neighbor.
434
435
        @return (bool) True is blocks are neighbors, False otherwise.
436
437
        """
438
        if block is not self:
439
            block_max_corner, _ = block.get_corners()
440
            dimension = len(block_max_corner)
441
            neighborhood_score = self.__calculate_neighborhood(block_max_corner)
442
443
            if neighborhood_score == dimension:
444
                return True
445
446
        return False
447
448
449
    def __calculate_neighborhood(self, block_max_corner):
450
        """!
451
        @brief Calculates neighborhood score that defined whether blocks are neighbors.
452
453
        @param[in] block_max_corner (list): Maximum coordinates of other block.
454
455
        @return (uint) Neighborhood score.
456
457
        """
458
        dimension = len(block_max_corner)
459
460
        length_edges = [self.__max_corner[i] - self.__min_corner[i] for i in range(dimension)]
461
462
        neighborhood_score = 0
463
        for i in range(dimension):
464
            diff = abs(block_max_corner[i] - self.__max_corner[i])
465
466
            if diff <= length_edges[i] + length_edges[i] * 0.0001:
467
                neighborhood_score += 1
468
469
        return neighborhood_score
470
471
472
    def __calculate_volume(self):
473
        """!
474
        @brief Calculates volume of current spatial block.
475
476
        @return (double) Volume of current spatial block.
477
478
        """
479
        volume = self.__max_corner[0] - self.__min_corner[0]
480
        for i in range(1, len(self.__max_corner)):
481
            volume *= self.__max_corner[i] - self.__min_corner[i]
482
483
        return volume
484
485
486
487
class bang_block:
488
    """!
489
    @brief BANG-block that represent spatial region in data space.
490
491
    """
492
    def __init__(self, data, region, level, space_block, cache_points=False):
493
        self.__data = data
494
        self.__region_number = region
495
        self.__level = level
496
        self.__spatial_block = space_block
497
        self.__cache_points = cache_points
498
499
        self.__cluster = None
500
        self.__points = None
501
        self.__density = self.__calculate_density()
502
503
504
    def __str__(self):
505
        """!
506
        @brief Returns string representation of BANG-block using region number and level where block is located.
507
508
        """
509
        return "(" + str(self.__region_number) + ", " + str(self.__level) + ")"
510
511
512
    def get_region(self):
513
        """!
514
        @brief Returns region number of BANG-block.
515
        @details Region number is unique on among region numbers on a directory level. Pair of region number and level
516
                  is unique for all directory.
517
518
        @return (uint) Region number.
519
520
        """
521
        return self.__region_number
522
523
524
    def get_density(self):
525
        """!
526
        @brief Returns density of the BANG-block.
527
528
        @return (double) BANG-block density.
529
530
        """
531
        return self.__density
532
533
534
    def get_cluster(self):
535
        """!
536
        @brief Return index of cluster to which the BANG-block belongs to.
537
        @details Index of cluster may have None value if the block was not assigned to any cluster.
538
539
        @return (uint) Index of cluster or None if the block does not belong to any cluster.
540
541
        """
542
        return self.__cluster
543
544
545
    def get_spatial_block(self):
546
        """!
547
        @brief Return spatial block - BANG-block description in data space.
548
549
        @return (spatial_block) Spatial block of the BANG-block.
550
551
        """
552
        return self.__spatial_block
553
554
555
    def get_points(self):
556
        """!
557
        @brief Return points that covers by the BANG-block.
558
559
        @return (list) List of point indexes that are covered by the block.
560
561
        """
562
        if self.__points is None:
563
            self.__cache_covered_data()
564
565
        return self.__points
566
567
568
    def set_cluster(self, index):
569
        """!
570
        @brief Assign cluster to the BANG-block by index.
571
572
        @param[in] index (uint): Index cluster that is assigned to BANG-block.
573
574
        """
575
        self.__cluster = index
576
577
578
    def is_neighbor(self, block):
579
        """!
580
        @brief Performs calculation to check whether specified block is neighbor to the current.
581
582
        @param[in] block (bang_block): Other BANG-block that should be checked for neighborhood.
583
584
        @return (bool) True if blocks are neighbors, False if blocks are not neighbors.
585
586
        """
587
        return self.get_spatial_block().is_neighbor(block.get_spatial_block())
588
589
590
    def split(self, split_dimension, cache_points):
591
        """!
592
        @brief Split BANG-block into two new blocks in specified dimension.
593
594
        @param[in] split_dimension (uint): Dimension where block should be split.
595
        @param[in] cache_points (bool): If True then covered points are cached. Used for leaf blocks.
596
597
        @return (tuple) Pair of BANG-block that were formed from the current.
598
599
        """
600
        left_region_number = self.__region_number
601
        right_region_number = self.__region_number + 2 ** self.__level
602
603
        first_spatial_block, second_spatial_block = self.__spatial_block.split(split_dimension)
604
605
        left = bang_block(self.__data, left_region_number, self.__level + 1, first_spatial_block, cache_points)
606
        right = bang_block(self.__data, right_region_number, self.__level + 1, second_spatial_block, cache_points)
607
608
        return left, right
609
610
611
    def __calculate_density(self):
612
        """!
613
        @brief Calculates BANG-block density.
614
615
        @return (double) BANG-block density.
616
617
        """
618
        return self.__get_amount_points() / self.__spatial_block.get_volume()
619
620
621
    def __get_amount_points(self):
622
        """!
623
        @brief Count covered points by the BANG-block and if cache is enable then covered points are stored.
624
625
        @return (uint) Amount of covered points.
626
627
        """
628
        amount = 0
629
        for index in range(len(self.__data)):
630
            if self.__data[index] in self.__spatial_block:
631
                self.__cache_point(index)
632
                amount += 1
633
634
        return amount
635
636
637
    def __cache_covered_data(self):
638
        """!
639
        @brief Cache covered data.
640
641
        """
642
        self.__cache_points = True
643
        self.__points = []
644
645
        for index_point in range(len(self.__data)):
646
            if self.__data[index_point] in self.__spatial_block:
647
                self.__cache_point(index_point)
648
649
650
    def __cache_point(self, index):
651
        """!
652
        @brief Store index points.
653
654
        @param[in] index (uint): Index point that should be stored.
655
656
        """
657
        if self.__cache_points:
658
            if self.__points is None:
659
                self.__points = []
660
661
            self.__points.append(index)
662
663
664
665
class bang:
666
    """!
667
    @brief Class implements BANG grid based clustering algorithm.
668
    @details BANG clustering algorithms uses a multidimensional grid structure to organize the value space surrounding
669
              the pattern values. The patterns are grouped into blocks and clustered with respect to the blocks by
670
              a topological neighbor search algorithm @cite inproceedings::bang::1.
671
672
    Code example of BANG usage:
673
    @code
674
        from pyclustering.cluster.bang import bang, bang_visualizer
675
        from pyclustering.utils import read_sample
676
        from pyclustering.samples.definitions import FCPS_SAMPLES
677
678
        # Read data three dimensional data.
679
        data = read_sample(FCPS_SAMPLES.SAMPLE_CHAINLINK)
680
681
        # Prepare algorithm's parameters.
682
        levels = 11
683
684
        # Create instance of BANG algorithm.
685
        bang_instance = bang(data, levels)
686
        bang_instance.process()
687
688
        # Obtain clustering results.
689
        clusters = bang_instance.get_clusters()
690
        noise = bang_instance.get_noise()
691
        directory = bang_instance.get_directory()
692
        dendrogram = bang_instance.get_dendrogram()
693
694
        # Visualize BANG clustering results.
695
        bang_visualizer.show_blocks(directory)
696
        bang_visualizer.show_dendrogram(dendrogram)
697
        bang_visualizer.show_clusters(data, clusters, noise)
698
    @endcode
699
700
    There is visualization of BANG-clustering of three-dimensional data 'chainlink'. BANG-blocks that were formed during
701
    processing are shown on following figure. The darkest color means highest density, blocks that does not cover points
702
    are transparent:
703
    @image html bang_blocks_chainlink.png "Fig. 1. BANG-blocks that cover input data."
704
705
    Here is obtained dendrogram that can be used for further analysis to improve clustering results:
706
    @image html bang_dendrogram_chainlink.png "Fig. 2. BANG dendrogram where the X-axis contains BANG-blocks, the Y-axis contains density."
707
708
    BANG clustering result of 'chainlink' data:
709
    @image html bang_clustering_chainlink.png "Fig. 3. BANG clustering result. Data: 'chainlink'."
710
711
    """
712
713
    def __init__(self, data, levels, density_threshold=0.0, ccore=False):
714
        """!
715
        @brief Create BANG clustering algorithm.
716
717
        @param[in] data (list): Input data (list of points) that should be clustered.
718
        @param[in] levels (uint): Amount of levels in tree (how many times block should be split).
719
        @param[in] density_threshold (double): If block density is smaller than this value then contained data by this
720
                    block is considered as a noise.
721
        @param[in] ccore (bool): Reserved positional argument - not used yet.
722
723
        """
724
        self.__data = data
725
        self.__levels = levels
726
        self.__directory = None
727
        self.__clusters = []
728
        self.__noise = []
729
        self.__cluster_blocks = []
730
        self.__dendrogram = []
731
        self.__density_threshold = density_threshold
732
        self.__ccore = ccore
733
734
        self.__validate_arguments()
735
736
737
    def process(self):
738
        """!
739
        @brief Performs clustering process in line with rules of BANG clustering algorithm.
740
741
        @see get_clusters()
742
        @see get_noise()
743
        @see get_directory()
744
        @see get_dendrogram()
745
746
        """
747
        self.__directory = bang_directory(self.__data, self.__levels, self.__density_threshold)
748
        self.__allocate_clusters()
749
750
751
    def get_clusters(self):
752
        """!
753
        @brief Returns allocated clusters.
754
755
        @remark Allocated clusters are returned only after data processing (method process()). Otherwise empty list is returned.
756
757
        @return (list) List of allocated clusters, each cluster contains indexes of objects in list of data.
758
759
        @see process()
760
        @see get_noise()
761
762
        """
763
        return self.__clusters
764
765
766
    def get_noise(self):
767
        """!
768
        @brief Returns allocated noise.
769
770
        @remark Allocated noise is returned only after data processing (method process()). Otherwise empty list is returned.
771
772
        @return (list) List of indexes that are marked as a noise.
773
774
        @see process()
775
        @see get_clusters()
776
777
        """
778
        return self.__noise
779
780
781
    def get_directory(self):
782
        """!
783
        @brief Returns grid directory that describes grid of the processed data.
784
785
        @remark Grid directory is returned only after data processing (method process()). Otherwise None value is returned.
786
787
        @return (bang_directory) BANG directory that describes grid of process data.
788
789
        @see process()
790
791
        """
792
        return self.__directory
793
794
795
    def get_dendrogram(self):
796
        """!
797
        @brief Returns dendrogram of clusters.
798
        @details Dendrogram is created in following way: the density indices of all regions are calculated and sorted
799
                  in decreasing order for each cluster during clustering process.
800
801
        @remark Dendrogram is returned only after data processing (method process()). Otherwise empty list is returned.
802
803
        """
804
        return self.__dendrogram
805
806
807
    def get_cluster_encoding(self):
808
        """!
809
        @brief Returns clustering result representation type that indicate how clusters are encoded.
810
811
        @return (type_encoding) Clustering result representation.
812
813
        @see get_clusters()
814
815
        """
816
817
        return type_encoding.CLUSTER_INDEX_LIST_SEPARATION
818
819
820
    def __validate_arguments(self):
821
        """!
822
        @brief Check input arguments of BANG algorithm and if one of them is not correct then appropriate exception
823
                is thrown.
824
825
        """
826
        if self.__levels <= 0:
827
            raise ValueError("Incorrect amount of levels '%d'. Level value should be greater than 0." % self.__levels)
828
829
        if len(self.__data) == 0:
830
            raise ValueError("Empty input data. Data should contain at least one point.")
831
832
        if self.__density_threshold < 0:
833
            raise ValueError("Incorrect density threshold '%f'. Density threshold should not be negative." % self.__density_threshold)
834
835
836
    def __allocate_clusters(self):
837
        """!
838
        @brief Performs cluster allocation using leafs of tree in BANG directory (the smallest cells).
839
840
        """
841
        leaf_blocks = self.__directory.get_leafs()
842
        unhandled_block_indexes = set([i for i in range(len(leaf_blocks)) if leaf_blocks[i].get_density() > self.__density_threshold])
843
        appropriate_block_indexes = set(unhandled_block_indexes)
0 ignored issues
show
Unused Code introduced by
The variable appropriate_block_indexes seems to be unused.
Loading history...
844
845
        current_block = self.__find_block_center(leaf_blocks, unhandled_block_indexes)
846
        cluster_index = 0
847
848
        while current_block is not None:
849
            if current_block.get_density() <= self.__density_threshold:
850
                break
851
852
            self.__expand_cluster_block(current_block, cluster_index, leaf_blocks, unhandled_block_indexes)
853
854
            current_block = self.__find_block_center(leaf_blocks, unhandled_block_indexes)
855
            cluster_index += 1
856
857
        self.__store_clustering_results(cluster_index, leaf_blocks)
858
859
860
    def __expand_cluster_block(self, block, cluster_index, leaf_blocks, unhandled_block_indexes):
861
        """!
862
        @brief Expand cluster from specific block that is considered as a central block.
863
864
        @param[in] block (bang_block): Block that is considered as a central block for cluster.
865
        @param[in] cluster_index (uint): Index of cluster that is assigned to blocks that forms new cluster.
866
        @param[in] leaf_blocks (list): Leaf BANG-blocks that are considered during cluster formation.
867
        @param[in] unhandled_block_indexes (set): Set of candidates (BANG block indexes) to become a cluster member. The
868
                    parameter helps to reduce traversing among BANG-block providing only restricted set of block that
869
                    should be considered.
870
871
        """
872
873
        block.set_cluster(cluster_index)
874
        self.__update_cluster_dendrogram(cluster_index, [block])
875
876
        neighbors = self.__find_block_neighbors(block, leaf_blocks, unhandled_block_indexes)
877
        self.__update_cluster_dendrogram(cluster_index, neighbors)
878
879
        for neighbor in neighbors:
880
            neighbor.set_cluster(cluster_index)
881
            neighbor_neighbors = self.__find_block_neighbors(neighbor, leaf_blocks, unhandled_block_indexes)
882
            self.__update_cluster_dendrogram(cluster_index, neighbor_neighbors)
883
884
            neighbors += neighbor_neighbors
885
886
887
    def __store_clustering_results(self, amount_clusters, leaf_blocks):
888
        """!
889
        @brief Stores clustering results in a convenient way.
890
891
        @param[in] amount_clusters (uint): Amount of cluster that was allocated during processing.
892
        @param[in] leaf_blocks (list): Leaf BANG-blocks (the smallest cells).
893
894
        """
895
        self.__clusters = [[] for _ in range(amount_clusters)]
896
        for block in leaf_blocks:
897
            index = block.get_cluster()
898
899
            if index is not None:
900
                self.__clusters[index] += block.get_points()
901
            else:
902
                self.__noise += block.get_points()
903
904
        self.__clusters = [ list(set(cluster)) for cluster in self.__clusters ]
905
        self.__noise = list(set(self.__noise))
906
907
908
    def __find_block_center(self, level_blocks, unhandled_block_indexes):
909
        """!
910
        @brief Search block that is cluster center for new cluster.
911
912
        @return (bang_block) Central block for new cluster, if cluster is not found then None value is returned.
913
914
        """
915
        for i in reversed(range(len(level_blocks))):
916
            if level_blocks[i].get_density() <= self.__density_threshold:
917
                return None
918
919
            if level_blocks[i].get_cluster() is None:
920
                unhandled_block_indexes.remove(i)
921
                return level_blocks[i]
922
923
        return None
924
925
926
    def __find_block_neighbors(self, block, level_blocks, unhandled_block_indexes):
927
        """!
928
        @brief Search block neighbors that are parts of new clusters (density is greater than threshold and that are
929
                not cluster members yet), other neighbors are ignored.
930
931
        @param[in] block (bang_block): BANG-block for which neighbors should be found (which can be part of cluster).
932
        @param[in] level_blocks (list): BANG-blocks on specific level.
933
        @param[in] unhandled_block_indexes (set): Blocks that have not been processed yet.
934
935
        @return (list) Block neighbors that can become part of cluster.
936
937
        """
938
        neighbors = []
939
940
        handled_block_indexes = []
941
        for unhandled_index in unhandled_block_indexes:
942
            if block.is_neighbor(level_blocks[unhandled_index]):
943
                handled_block_indexes.append(unhandled_index)
944
                neighbors.append(level_blocks[unhandled_index])
945
946
                # Maximum number of neighbors is eight
947
                if len(neighbors) == 8:
948
                    break
949
950
        for handled_index in handled_block_indexes:
951
            unhandled_block_indexes.remove(handled_index)
952
953
        return neighbors
954
955
956
    def __update_cluster_dendrogram(self, index_cluster, blocks):
957
        """!
958
        @brief Append clustered blocks to dendrogram.
959
960
        @param[in] index_cluster (uint): Cluster index that was assigned to blocks.
961
        @param[in] blocks (list): Blocks that were clustered.
962
963
        """
964
        if len(self.__dendrogram) <= index_cluster:
965
            self.__dendrogram.append([])
966
967
        blocks = sorted(blocks, key=lambda block: block.get_density(), reverse=True)
968
        self.__dendrogram[index_cluster] += blocks
969