Completed
Push — 0.8.dev ( 108016...85e2ef )
by Andrei
01:02
created

bang_visualizer   A

Complexity

Total Complexity 15

Size/Duplication

Total Lines 153
Duplicated Lines 0 %

Importance

Changes 0
Metric Value
dl 0
loc 153
rs 10
c 0
b 0
f 0
wmc 15

6 Methods

Rating   Name   Duplication   Size   Complexity  
A __get_rectangle_description() 0 20 2
B show_dendrogram() 0 24 3
B show_blocks() 0 28 4
A __draw_blocks() 0 15 2
A __draw_two_dimension_data() 0 19 3
B __draw_block() 0 26 1
1
"""!
2
3
@brief Cluster analysis algorithm: BANG.
4
@details Implementation based on paper @cite inproceedings::bang::1.
5
6
@authors Andrei Novikov ([email protected])
7
@date 2014-2018
8
@copyright GNU Public License
9
10
@cond GNU_PUBLIC_LICENSE
11
    PyClustering is free software: you can redistribute it and/or modify
12
    it under the terms of the GNU General Public License as published by
13
    the Free Software Foundation, either version 3 of the License, or
14
    (at your option) any later version.
15
16
    PyClustering is distributed in the hope that it will be useful,
17
    but WITHOUT ANY WARRANTY; without even the implied warranty of
18
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
19
    GNU General Public License for more details.
20
21
    You should have received a copy of the GNU General Public License
22
    along with this program.  If not, see <http://www.gnu.org/licenses/>.
23
@endcond
24
25
"""
26
27
28
import matplotlib
0 ignored issues
show
Configuration introduced by
The import matplotlib could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
29
import matplotlib.gridspec as gridspec;
0 ignored issues
show
Configuration introduced by
The import matplotlib.gridspec could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
30
import matplotlib.pyplot as plt
0 ignored issues
show
Configuration introduced by
The import matplotlib.pyplot could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
31
import matplotlib.patches as patches
0 ignored issues
show
Configuration introduced by
The import matplotlib.patches could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
32
33
import itertools
34
35
from pyclustering.cluster.encoder import type_encoding
36
37
from pyclustering.utils import data_corners
38
from pyclustering.utils.color import color as color_list
39
40
41
42
class bang_visualizer:
43
    """!
44
    @brief Visualizer of BANG algorithm's results.
45
    @details BANG visualizer provides visualization services that are specific for BANG algorithm.
46
47
    """
48
49
50
    __maximum_density_alpha = 0.6
51
52
53
    @staticmethod
54
    def show_blocks(directory):
55
        """!
56
        @brief Show BANG-blocks (leafs only) in data space.
57
        @details BANG-blocks represents grid that was used for clustering process.
58
59
        @param[in] directory (bang_directory): Directory that was created by BANG algorithm during clustering process.
60
61
        """
62
63
        dimension = len(directory.get_data()[0])
64
65
        amount_canvases = 1
66
        if dimension > 1:
67
            amount_canvases = int(dimension * (dimension - 1) / 2)
68
69
        figure = plt.figure();
70
        grid_spec = gridspec.GridSpec(1, amount_canvases);
71
72
        pairs = list(itertools.combinations(range(dimension), 2))
73
        if len(pairs) == 0: pairs = [(0, 0)]
74
75
        for index in range(amount_canvases):
76
            ax = figure.add_subplot(grid_spec[index]);
77
            bang_visualizer.__draw_blocks(ax, directory.get_leafs(), pairs[index])
78
            bang_visualizer.__draw_two_dimension_data(ax, directory.get_data(), pairs[index])
79
80
        plt.show()
81
82
83
    @staticmethod
84
    def show_dendrogram(dendrogram):
85
        """!
86
        @brief Display dendrogram of BANG-blocks.
87
88
        @param[in] dendrogram (list): List representation of dendrogram of BANG-blocks.
89
90
        @see bang.get_dendrogram()
91
92
        """
93
        plt.figure()
94
        axis = plt.subplot(1, 1, 1)
95
96
        current_position = 0
97
        for index_cluster in range(len(dendrogram)):
98
            densities = [ block.get_density() for block in dendrogram[index_cluster] ]
99
            xrange = range(current_position, current_position + len(densities))
100
101
            axis.bar(xrange, densities, 1.0, linewidth=0.0, color=color_list.get_color(index_cluster))
102
103
            current_position += len(densities)
104
105
        plt.xlim([-0.5, current_position - 0.5])
106
        plt.show()
107
108
109
    @staticmethod
110
    def __draw_two_dimension_data(ax, data, pair):
111
        """!
112
        @brief Display data in two-dimensional canvas.
113
114
        @param[in] ax (Axis): Canvas where data should be displayed.
115
        @param[in] data (list): Data points that should be displayed.
116
        @param[in] pair (tuple): Pair of dimension indexes.
117
118
        """
119
        ax.set_xlabel("x%d" % pair[0])
120
        ax.set_ylabel("x%d" % pair[1])
121
122
        for point in data:
123
            if len(data[0]) > 1:
124
                ax.plot(point[pair[0]], point[pair[1]], color='red', marker='.')
125
            else:
126
                ax.plot(point[pair[0]], 0, color='red', marker='.')
127
                ax.yaxis.set_ticklabels([]);
128
129
130
    @staticmethod
131
    def __draw_blocks(ax, blocks, pair):
132
        """!
133
        @brief Display BANG-blocks on specified figure.
134
135
        @param[in] ax (Axis): Axis where bang-blocks should be displayed.
136
        @param[in] blocks (list): List of blocks that should be displyed.
137
        @param[in] pair (tuple): Pair of coordinate index that should be displyed.
138
139
        """
140
        ax.grid(False)
141
142
        density_scale = blocks[-1].get_density()
143
        for block in blocks:
144
            bang_visualizer.__draw_block(ax, pair, block, density_scale)
145
146
147
    @staticmethod
148
    def __draw_block(ax, pair, block, density_scale):
149
        """!
150
        @brief Display BANG-block on the specified ax.
151
152
        @param[in] ax (Axis): Axis where block should be displayed.
153
        @param[in] pair (tuple): Pair of coordinate index that should be displayed.
154
        @param[in] block (bang_block): BANG-block that should be displayed.
155
        @param[in] density_scale (double): Max density to display density of the block by appropriate tone.
156
157
        """
158
        max_corner, min_corner = bang_visualizer.__get_rectangle_description(block, pair)
159
160
        belong_cluster = block.get_cluster() is not None
161
162
        density_scale = bang_visualizer.__maximum_density_alpha * block.get_density() / density_scale
163
164
        face_color = matplotlib.colors.to_rgba('blue', alpha=density_scale)
165
        edge_color = matplotlib.colors.to_rgba('black', alpha=1.0)
166
167
        rect = patches.Rectangle(min_corner, max_corner[0] - min_corner[0], max_corner[1] - min_corner[1],
168
                                 fill=belong_cluster,
169
                                 facecolor=face_color,
170
                                 edgecolor=edge_color,
171
                                 linewidth=0.5)
172
        ax.add_patch(rect)
173
174
175
    @staticmethod
176
    def __get_rectangle_description(block, pair):
177
        """!
178
        @brief Create rectangle description for block in specific dimension.
179
180
        @param[in] pair (tuple): Pair of coordinate index that should be displayed.
181
        @param[in] block (bang_block): BANG-block that should be displayed
182
183
        @return (tuple) Pair of corners that describes rectangle.
184
185
        """
186
        max_corner, min_corner = block.get_spatial_block().get_corners()
187
188
        max_corner = [max_corner[pair[0]], max_corner[pair[1]]]
189
        min_corner = [min_corner[pair[0]], min_corner[pair[1]]]
190
191
        if pair == (0, 0):
192
            max_corner[1], min_corner[1] = 1.0, -1.0
193
194
        return max_corner, min_corner
195
196
197
class bang_directory:
198
    """!
199
    @brief BANG directory stores BANG-blocks that represents grid in data space.
200
    @details The directory build BANG-blocks in binary tree manner. Leafs of the tree stored separately to provide
201
              a direct access to the leafs that should be analysed. Leafs cache data-points.
202
203
    """
204
    def __init__(self, data, levels, density_threshold=0.0):
205
        """!
206
        @brief Create BANG directory - basically tree structure with direct access to leafs.
207
208
        @param[in] levels (uint): Height of the blocks tree.
209
        @param[in] density_threshold (double): The lowest level of density when contained data by bang-block is
210
                    considered as a noise and there is no need to split it till the last level.
211
212
        """
213
        self.__data = data
214
        self.__levels = levels
215
        self.__density_threshold = density_threshold
216
        self.__leafs = []
217
        self.__root = None
218
219
        self.__create_directory()
220
221
222
    def get_data(self):
223
        """!
224
        @brief Return data that is stored in the directory.
225
226
        @return (list) List of points that represents stored data.
227
228
        """
229
        return self.__data
230
231
232
    def get_leafs(self):
233
        """!
234
        @brief Return leafs - the smallest blocks.
235
        @details Some leafs can be bigger than others because splitting is not performed for blocks whose density is
236
                  less than threshold.
237
238
        @return (list) List of blocks that are leafs of BANG directory.
239
240
        """
241
        return self.__leafs
242
243
244
    def __create_directory(self):
245
        """!
246
        @brief Create BANG directory as a tree with separate storage for leafs.
247
248
        """
249
250
        min_corner, max_corner = data_corners(self.__data)
251
        data_block = spatial_block(max_corner, min_corner)
252
253
        cache_require = (self.__levels == 1)
254
        self.__root = bang_block(self.__data, 0, 0, data_block, cache_require)
255
256
        if cache_require:
257
            self.__leafs.append(self.__root)
258
        else:
259
            self.__build_directory_levels()
260
261
262
    def __build_directory_levels(self):
263
        """!
264
        @brief Build levels of direction if amount of level is greater than one.
265
266
        """
267
        previous_level_blocks = [ self.__root ]
268
        for level in range(1, self.__levels):
269
            previous_level_blocks = self.__build_level(previous_level_blocks, level)
270
271
        self.__leafs = sorted(self.__leafs, key=lambda block: block.get_density())
272
273
274
    def __build_level(self, previous_level_blocks, level):
275
        """!
276
        @brief Build new level of directory.
277
278
        @param[in] previous_level_blocks (list): BANG-blocks on the previous level.
279
        @param[in] level (uint): Level number that should be built.
280
281
        @return (list) New block on the specified level.
282
283
        """
284
        current_level_blocks = []
285
286
        split_dimension = level % len(self.__data[0])
287
        cache_require = (level == self.__levels - 1)
288
289
        for block in previous_level_blocks:
290
            self.__split_block(block, split_dimension, cache_require, current_level_blocks)
291
292
        if cache_require:
293
            self.__leafs += current_level_blocks
294
295
        return current_level_blocks
296
297
298
    def __split_block(self, block, split_dimension, cache_require, current_level_blocks):
299
        """!
300
        @brief Split specific block in specified dimension.
301
        @details Split is not performed for block whose density is lower than threshold value, such blocks are putted to
302
                  leafs.
303
304
        @param[in] block (bang_block): BANG-block that should be split.
305
        @param[in] split_dimension (uint): Dimension at which splitting should be performed.
306
        @param[in] cache_require (bool): Defines when points in cache should be stored during density calculation.
307
        @param[in|out] current_level_blocks (list): Block storage at the current level where new blocks should be added.
308
309
        """
310
        if block.get_density() <= self.__density_threshold:
311
            self.__leafs.append(block)
312
313
        else:
314
            left, right = block.split(split_dimension, cache_require)
315
            current_level_blocks.append(left)
316
            current_level_blocks.append(right)
317
318
319
320
class spatial_block:
321
    """!
322
    @brief Geometrical description of BANG block in data space.
323
    @details Provides services related to spatial function and used by bang_block
324
325
    @see bang_block
326
327
    """
328
329
    def __init__(self, max_corner, min_corner):
330
        """!
331
        @brief Creates spatial block in data space.
332
333
        @param[in] max_corner (array_like): Maximum corner coordinates of the block.
334
        @param[in] min_corner (array_like): Minimal corner coordinates of the block.
335
336
        """
337
        self.__max_corner = max_corner
338
        self.__min_corner = min_corner
339
        self.__volume = self.__calculate_volume()
340
341
342
    def __str__(self):
343
        """!
344
        @brief Returns string block description.
345
346
        @return String representation of the block.
347
348
        """
349
        return "(max: %s; min: %s)" % (self.__max_corner, self.__min_corner)
350
351
352
    def __contains__(self, point):
353
        """!
354
        @brief Point is considered as contained if it lies in block (belong to it).
355
356
        @return (bool) True if point is in block, otherwise False.
357
358
        """
359
        for i in range(len(point)):
360
            if point[i] < self.__min_corner[i] or point[i] > self.__max_corner[i]:
361
                return False
362
363
        return True
364
365
366
    def get_corners(self):
367
        """!
368
        @brief Return spatial description of current block.
369
370
        @return (tuple) Pair of maximum and minimum corners (max_corner, min_corner).
371
372
        """
373
        return self.__max_corner, self.__min_corner
374
375
376
    def get_volume(self):
377
        """!
378
        @brief Returns volume of current block.
379
        @details Volume block has uncommon mining here: for 1D is length of a line, for 2D is square of rectangle,
380
                  for 3D is volume of 3D figure, and for ND is volume of ND figure.
381
382
        @return (double) Volume of current block.
383
384
        """
385
        return self.__volume
386
387
388
    def split(self, dimension):
389
        """!
390
        @brief Split current block into two spatial blocks in specified dimension.
391
392
        @param[in] dimension (uint): Dimension where current block should be split.
393
394
        @return (tuple) Pair of new split blocks from current block.
395
396
        """
397
        first_max_corner = self.__max_corner[:]
398
        second_min_corner = self.__min_corner[:]
399
400
        split_border = (self.__max_corner[dimension] + self.__min_corner[dimension]) / 2.0
401
402
        first_max_corner[dimension] = split_border
403
        second_min_corner[dimension] = split_border
404
405
        return spatial_block(first_max_corner, self.__min_corner), spatial_block(self.__max_corner, second_min_corner)
406
407
408
    def is_neighbor(self, block):
409
        """!
410
        @brief Performs calculation to identify whether specified block is neighbor of current block.
411
412
        @param[in] block (spatial_block): Another block that is check whether it is neighbor.
413
414
        @return (bool) True is blocks are neighbors, False otherwise.
415
416
        """
417
        if block is not self:
418
            block_max_corner, _ = block.get_corners()
419
            dimension = len(block_max_corner)
420
            neighborhood_score = self.__calculate_neighborhood(block_max_corner)
421
422
            if neighborhood_score == dimension:
423
                return True
424
425
        return False
426
427
428
    def __calculate_neighborhood(self, block_max_corner):
429
        """!
430
        @brief Calculates neighborhood score that defined whether blocks are neighbors.
431
432
        @param[in] block_max_corner (list): Maximum coordinates of other block.
433
434
        @return (uint) Neighborhood score.
435
436
        """
437
        dimension = len(block_max_corner)
438
439
        length_edges = [self.__max_corner[i] - self.__min_corner[i] for i in range(dimension)]
440
441
        neighborhood_score = 0
442
        for i in range(dimension):
443
            diff = abs(block_max_corner[i] - self.__max_corner[i])
444
445
            if diff <= length_edges[i] + length_edges[i] * 0.0001:
446
                neighborhood_score += 1
447
448
        return neighborhood_score
449
450
451
    def __calculate_volume(self):
452
        """!
453
        @brief Calculates volume of current spatial block.
454
455
        @return (double) Volume of current spatial block.
456
457
        """
458
        volume = self.__max_corner[0] - self.__min_corner[0]
459
        for i in range(1, len(self.__max_corner)):
460
            volume *= self.__max_corner[i] - self.__min_corner[i]
461
462
        return volume
463
464
465
466
class bang_block:
467
    """!
468
    @brief BANG-block that represent spatial region in data space.
469
470
    """
471
    def __init__(self, data, region, level, space_block, cache_points=False):
472
        self.__data = data
473
        self.__region_number = region
474
        self.__level = level
475
        self.__spatial_block = space_block
476
        self.__cache_points = cache_points
477
478
        self.__cluster = None
479
        self.__points = None
480
        self.__density = self.__calculate_density()
481
482
483
    def __str__(self):
484
        """!
485
        @brief Returns string representation of BANG-block using region number and level where block is located.
486
487
        """
488
        return "(" + str(self.__region_number) + ", " + str(self.__level) + ")"
489
490
491
    def get_region(self):
492
        """!
493
        @brief Returns region number of BANG-block.
494
        @details Region number is unique on among region numbers on a directory level. Pair of region number and level
495
                  is unique for all directory.
496
497
        @return (uint) Region number.
498
499
        """
500
        return self.__region_number
501
502
503
    def get_density(self):
504
        """!
505
        @brief Returns density of the BANG-block.
506
507
        @return (double) BANG-block density.
508
509
        """
510
        return self.__density
511
512
513
    def get_cluster(self):
514
        """!
515
        @brief Return index of cluster to which the BANG-block belongs to.
516
        @details Index of cluster may have None value if the block was not assigned to any cluster.
517
518
        @return (uint) Index of cluster or None if the block does not belong to any cluster.
519
520
        """
521
        return self.__cluster
522
523
524
    def get_spatial_block(self):
525
        """!
526
        @brief Return spatial block - BANG-block description in data space.
527
528
        @return (spatial_block) Spatial block of the BANG-block.
529
530
        """
531
        return self.__spatial_block
532
533
534
    def get_points(self):
535
        """!
536
        @brief Return points that covers by the BANG-block.
537
538
        @return (list) List of point indexes that are covered by the block.
539
540
        """
541
        if self.__points is None:
542
            self.__cache_covered_data()
543
544
        return self.__points
545
546
547
    def set_cluster(self, index):
548
        """!
549
        @brief Assign cluster to the BANG-block by index.
550
551
        @param[in] index (uint): Index cluster that is assigned to BANG-block.
552
553
        """
554
        self.__cluster = index
555
556
557
    def is_neighbor(self, block):
558
        """!
559
        @brief Performs calculation to check whether specified block is neighbor to the current.
560
561
        @param[in] block (bang_block): Other BANG-block that should be checked for neighborhood.
562
563
        @return (bool) True if blocks are neighbors, False if blocks are not neighbors.
564
565
        """
566
        return self.get_spatial_block().is_neighbor(block.get_spatial_block())
567
568
569
    def split(self, split_dimension, cache_points):
570
        """!
571
        @brief Split BANG-block into two new blocks in specified dimension.
572
573
        @param[in] split_dimension (uint): Dimension where block should be split.
574
        @param[in] cache_points (bool): If True then covered points are cached. Used for leaf blocks.
575
576
        @return (tuple) Pair of BANG-block that were formed from the current.
577
578
        """
579
        left_region_number = self.__region_number
580
        right_region_number = self.__region_number + 2 ** self.__level
581
582
        first_spatial_block, second_spatial_block = self.__spatial_block.split(split_dimension)
583
584
        left = bang_block(self.__data, left_region_number, self.__level + 1, first_spatial_block, cache_points)
585
        right = bang_block(self.__data, right_region_number, self.__level + 1, second_spatial_block, cache_points)
586
587
        return left, right
588
589
590
    def __calculate_density(self):
591
        """!
592
        @brief Calculates BANG-block density.
593
594
        @return (double) BANG-block density.
595
596
        """
597
        return self.__get_amount_points() / self.__spatial_block.get_volume()
598
599
600
    def __get_amount_points(self):
601
        """!
602
        @brief Count covered points by the BANG-block and if cache is enable then covered points are stored.
603
604
        @return (uint) Amount of covered points.
605
606
        """
607
        amount = 0
608
        for index in range(len(self.__data)):
609
            if self.__data[index] in self.__spatial_block:
610
                self.__cache_point(index)
611
                amount += 1
612
613
        return amount
614
615
616
    def __cache_covered_data(self):
617
        """!
618
        @brief Cache covered data.
619
620
        """
621
        self.__cache_points = True
622
        self.__points = []
623
624
        for index_point in range(len(self.__data)):
625
            if self.__data[index_point] in self.__spatial_block:
626
                self.__cache_point(index_point)
627
628
629
    def __cache_point(self, index):
630
        """!
631
        @brief Store index points.
632
633
        @param[in] index (uint): Index point that should be stored.
634
635
        """
636
        if self.__cache_points:
637
            if self.__points is None:
638
                self.__points = []
639
640
            self.__points.append(index)
641
642
643
644
class bang:
645
    """!
646
    @brief Class implements BANG grid based clustering algorithm.
647
    @details BANG clustering algorithms uses a multidimensional grid structure to organize the value space surrounding
648
              the pattern values. The patterns are grouped into blocks and clustered with respect to the blocks by
649
              a topological neighbor search algorithm @cite inproceedings::bang::1.
650
651
    """
652
653
    def __init__(self, data, levels, density_threshold=0.0, ccore=False):
654
        """!
655
        @brief Create BANG clustering algorithm.
656
657
        @param[in] data (list): Input data (list of points) that should be clustered.
658
        @param[in] levels (uint): Amount of levels in tree (how many times block should be split).
659
        @param[in] density_threshold (double): If block density is smaller than this value then contained data by this
660
                    block is considered as a noise.
661
        @param[in] ccore (bool): Reserved positional argument - not used yet.
662
663
        """
664
        self.__data = data
665
        self.__levels = levels
666
        self.__directory = None
667
        self.__clusters = []
668
        self.__noise = []
669
        self.__cluster_blocks = []
670
        self.__dendrogram = []
671
        self.__density_threshold = density_threshold
672
        self.__ccore = ccore
673
674
675
    def process(self):
676
        """!
677
        @brief Performs clustering process in line with rules of BANG clustering algorithm.
678
679
        @see get_clusters()
680
        @see get_noise()
681
        @see get_directory()
682
        @see get_dendrogram()
683
684
        """
685
        self.__validate_arguments()
686
687
        self.__directory = bang_directory(self.__data, self.__levels, self.__density_threshold)
688
        self.__allocate_clusters()
689
690
691
    def get_clusters(self):
692
        """!
693
        @brief Returns allocated clusters.
694
695
        @remark Allocated clusters are returned only after data processing (method process()). Otherwise empty list is returned.
696
697
        @return (list) List of allocated clusters, each cluster contains indexes of objects in list of data.
698
699
        @see process()
700
        @see get_noise()
701
702
        """
703
        return self.__clusters
704
705
706
    def get_noise(self):
707
        """!
708
        @brief Returns allocated noise.
709
710
        @remark Allocated noise is returned only after data processing (method process()). Otherwise empty list is returned.
711
712
        @return (list) List of indexes that are marked as a noise.
713
714
        @see process()
715
        @see get_clusters()
716
717
        """
718
        return self.__noise
719
720
721
    def get_directory(self):
722
        """!
723
        @brief Returns grid directory that describes grid of the processed data.
724
725
        @remark Grid directory is returned only after data processing (method process()). Otherwise None value is returned.
726
727
        @return (bang_directory) BANG directory that describes grid of process data.
728
729
        @see process()
730
731
        """
732
        return self.__directory
733
734
735
    def get_dendrogram(self):
736
        """!
737
        @brief Returns dendrogram of clusters.
738
        @details Dendrogram is created in following way: the density indices of all regions are calculated and sorted
739
                  in decreasing order for each cluster during clustering process.
740
741
        @remark Dendrogram is returned only after data processing (method process()). Otherwise empty list is returned.
742
743
        """
744
        return self.__dendrogram
745
746
747
    def get_cluster_encoding(self):
748
        """!
749
        @brief Returns clustering result representation type that indicate how clusters are encoded.
750
751
        @return (type_encoding) Clustering result representation.
752
753
        @see get_clusters()
754
755
        """
756
757
        return type_encoding.CLUSTER_INDEX_LIST_SEPARATION
758
759
760
    def __validate_arguments(self):
761
        """!
762
        @brief Check input arguments of BANG algorithm and if one of them is not correct then appropriate exception
763
                is thrown.
764
765
        """
766
        if self.__levels <= 0:
767
            raise ValueError("Incorrect amount of levels '%d'. Level value should be greater than 0." % self.__levels)
768
769
        if len(self.__data) == 0:
770
            raise ValueError("Empty input data. Data should contain at least one point.")
771
772
        if self.__density_threshold < 0:
773
            raise ValueError("Incorrect density threshold '%f'. Density threshold should not be negative." % self.__density_threshold)
774
775
776
    def __allocate_clusters(self):
777
        """!
778
        @brief Performs cluster allocation using leafs of tree in BANG directory (the smallest cells).
779
780
        """
781
        leaf_blocks = self.__directory.get_leafs()
782
        unhandled_block_indexes = set([i for i in range(len(leaf_blocks)) if leaf_blocks[i].get_density() > self.__density_threshold])
783
        appropriate_block_indexes = set(unhandled_block_indexes)
0 ignored issues
show
Unused Code introduced by
The variable appropriate_block_indexes seems to be unused.
Loading history...
784
785
        current_block = self.__find_block_center(leaf_blocks, unhandled_block_indexes)
786
        cluster_index = 0
787
788
        while current_block is not None:
789
            if current_block.get_density() <= self.__density_threshold:
790
                break
791
792
            self.__expand_cluster_block(current_block, cluster_index, leaf_blocks, unhandled_block_indexes)
793
794
            current_block = self.__find_block_center(leaf_blocks, unhandled_block_indexes)
795
            cluster_index += 1
796
797
        self.__store_clustering_results(cluster_index, leaf_blocks)
798
799
800
    def __expand_cluster_block(self, block, cluster_index, leaf_blocks, unhandled_block_indexes):
801
        """!
802
        @brief Expand cluster from specific block that is considered as a central block.
803
804
        @param[in] block (bang_block): Block that is considered as a central block for cluster.
805
        @param[in] cluster_index (uint): Index of cluster that is assigned to blocks that forms new cluster.
806
        @param[in] leaf_blocks (list): Leaf BANG-blocks that are considered during cluster formation.
807
        @param[in] unhandled_block_indexes (set): Set of candidates (BANG block indexes) to become a cluster member. The
808
                    parameter helps to reduce traversing among BANG-block providing only restricted set of block that
809
                    should be considered.
810
811
        """
812
813
        block.set_cluster(cluster_index)
814
        self.__update_cluster_dendrogram(cluster_index, [block])
815
816
        neighbors = self.__find_block_neighbors(block, leaf_blocks, unhandled_block_indexes)
817
        self.__update_cluster_dendrogram(cluster_index, neighbors)
818
819
        for neighbor in neighbors:
820
            neighbor.set_cluster(cluster_index)
821
            neighbor_neighbors = self.__find_block_neighbors(neighbor, leaf_blocks, unhandled_block_indexes)
822
            self.__update_cluster_dendrogram(cluster_index, neighbor_neighbors)
823
824
            neighbors += neighbor_neighbors
825
826
827
    def __store_clustering_results(self, amount_clusters, leaf_blocks):
828
        """!
829
        @brief Stores clustering results in a convenient way.
830
831
        @param[in] amount_clusters (uint): Amount of cluster that was allocated during processing.
832
        @param[in] leaf_blocks (list): Leaf BANG-blocks (the smallest cells).
833
834
        """
835
        self.__clusters = [[] for _ in range(amount_clusters)]
836
        for block in leaf_blocks:
837
            index = block.get_cluster()
838
839
            if index is not None:
840
                self.__clusters[index] += block.get_points()
841
            else:
842
                self.__noise += block.get_points()
843
844
        self.__clusters = [ list(set(cluster)) for cluster in self.__clusters ]
845
        self.__noise = list(set(self.__noise))
846
847
848
    def __find_block_center(self, level_blocks, unhandled_block_indexes):
849
        """!
850
        @brief Search block that is cluster center for new cluster.
851
852
        @return (bang_block) Central block for new cluster, if cluster is not found then None value is returned.
853
854
        """
855
        for i in reversed(range(len(level_blocks))):
856
            if level_blocks[i].get_density() <= self.__density_threshold:
857
                return None
858
859
            if level_blocks[i].get_cluster() is None:
860
                unhandled_block_indexes.remove(i)
861
                return level_blocks[i]
862
863
        return None
864
865
866
    def __find_block_neighbors(self, block, level_blocks, unhandled_block_indexes):
867
        """!
868
        @brief Search block neighbors that are parts of new clusters (density is greater than threshold and that are
869
                not cluster members yet), other neighbors are ignored.
870
871
        @param[in] block (bang_block): BANG-block for which neighbors should be found (which can be part of cluster).
872
        @param[in] level_blocks (list): BANG-blocks on specific level.
873
        @param[in] unhandled_block_indexes (set): Blocks that have not been processed yet.
874
875
        @return (list) Block neighbors that can become part of cluster.
876
877
        """
878
        neighbors = []
879
880
        handled_block_indexes = []
881
        for unhandled_index in unhandled_block_indexes:
882
            if block.is_neighbor(level_blocks[unhandled_index]):
883
                handled_block_indexes.append(unhandled_index)
884
                neighbors.append(level_blocks[unhandled_index])
885
886
                # Maximum number of neighbors is eight
887
                if len(neighbors) == 8:
888
                    break
889
890
        for handled_index in handled_block_indexes:
891
            unhandled_block_indexes.remove(handled_index)
892
893
        return neighbors
894
895
896
    def __update_cluster_dendrogram(self, index_cluster, blocks):
897
        """!
898
        @brief Append clustered blocks to dendrogram.
899
900
        @param[in] index_cluster (uint): Cluster index that was assigned to blocks.
901
        @param[in] blocks (list): Blocks that were clustered.
902
903
        """
904
        if len(self.__dendrogram) <= index_cluster:
905
            self.__dendrogram.append([])
906
907
        blocks = sorted(blocks, key=lambda block: block.get_density(), reverse=True)
908
        self.__dendrogram[index_cluster] += blocks
909