Completed
Push — 0.8.dev ( 74b1ba...f0a0be )
by Andrei
01:18
created

bang_directory.__len__()   A

Complexity

Conditions 1

Size

Total Lines 8

Duplication

Lines 0
Ratio 0 %

Importance

Changes 0
Metric Value
cc 1
dl 0
loc 8
rs 9.4285
c 0
b 0
f 0
1
"""!
2
3
@brief Cluster analysis algorithm: BANG.
4
@details Implementation based on paper @cite inproceedings::bang::1.
5
6
@authors Andrei Novikov ([email protected])
7
@date 2014-2018
8
@copyright GNU Public License
9
10
@cond GNU_PUBLIC_LICENSE
11
    PyClustering is free software: you can redistribute it and/or modify
12
    it under the terms of the GNU General Public License as published by
13
    the Free Software Foundation, either version 3 of the License, or
14
    (at your option) any later version.
15
16
    PyClustering is distributed in the hope that it will be useful,
17
    but WITHOUT ANY WARRANTY; without even the implied warranty of
18
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
19
    GNU General Public License for more details.
20
21
    You should have received a copy of the GNU General Public License
22
    along with this program.  If not, see <http://www.gnu.org/licenses/>.
23
@endcond
24
25
"""
26
27
28
import matplotlib
0 ignored issues
show
Configuration introduced by
The import matplotlib could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
29
import matplotlib.gridspec as gridspec
0 ignored issues
show
Configuration introduced by
The import matplotlib.gridspec could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
30
import matplotlib.pyplot as plt
0 ignored issues
show
Configuration introduced by
The import matplotlib.pyplot could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
31
import matplotlib.patches as patches
0 ignored issues
show
Configuration introduced by
The import matplotlib.patches could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
32
import matplotlib.animation as animation
0 ignored issues
show
Configuration introduced by
The import matplotlib.animation could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
33
34
import itertools
35
36
from pyclustering.cluster import cluster_visualizer
37
from pyclustering.cluster.encoder import type_encoding
38
39
from pyclustering.utils import data_corners
40
from pyclustering.utils.color import color as color_list
41
42
43
44
class bang_visualizer:
45
    """!
46
    @brief Visualizer of BANG algorithm's results.
47
    @details BANG visualizer provides visualization services that are specific for BANG algorithm.
48
49
    """
50
51
52
    __maximum_density_alpha = 0.6
53
54
55
    @staticmethod
56
    def show_blocks(directory):
57
        """!
58
        @brief Show BANG-blocks (leafs only) in data space.
59
        @details BANG-blocks represents grid that was used for clustering process.
60
61
        @param[in] directory (bang_directory): Directory that was created by BANG algorithm during clustering process.
62
63
        """
64
65
        dimension = len(directory.get_data()[0])
66
67
        amount_canvases = 1
68
        if dimension > 1:
69
            amount_canvases = int(dimension * (dimension - 1) / 2)
70
71
        figure = plt.figure()
72
        grid_spec = gridspec.GridSpec(1, amount_canvases)
73
74
        pairs = list(itertools.combinations(range(dimension), 2))
75
        if len(pairs) == 0: pairs = [(0, 0)]
76
77
        for index in range(amount_canvases):
78
            ax = figure.add_subplot(grid_spec[index])
79
            bang_visualizer.__draw_blocks(ax, directory.get_leafs(), pairs[index])
80
            bang_visualizer.__draw_two_dimension_data(ax, directory.get_data(), pairs[index])
81
82
        plt.show()
83
84
85
    @staticmethod
86
    def show_dendrogram(dendrogram):
87
        """!
88
        @brief Display dendrogram of BANG-blocks.
89
90
        @param[in] dendrogram (list): List representation of dendrogram of BANG-blocks.
91
92
        @see bang.get_dendrogram()
93
94
        """
95
        plt.figure()
96
        axis = plt.subplot(1, 1, 1)
97
98
        current_position = 0
99
        for index_cluster in range(len(dendrogram)):
100
            densities = [ block.get_density() for block in dendrogram[index_cluster] ]
101
            xrange = range(current_position, current_position + len(densities))
102
103
            axis.bar(xrange, densities, 1.0, linewidth=0.0, color=color_list.get_color(index_cluster))
104
105
            current_position += len(densities)
106
107
        axis.set_ylabel("density")
108
        axis.set_xlabel("block")
109
        axis.xaxis.set_ticklabels([])
110
111
        plt.xlim([-0.5, current_position - 0.5])
112
        plt.show()
113
114
115
    @staticmethod
116
    def show_clusters(data, clusters, noise=None):
117
        """!
118
        @brief Display K-Means clustering results.
119
120
        @param[in] data (list): Dataset that was used for clustering.
121
        @param[in] clusters (array_like): Clusters that were allocated by the algorithm.
122
        @param[in] noise (array_like): Noise that were allocated by the algorithm.
123
124
        """
125
        visualizer = cluster_visualizer()
126
        visualizer.append_clusters(clusters, data)
127
        visualizer.append_cluster(noise or [], data, marker='x')
128
        visualizer.show()
129
130
131
    @staticmethod
132
    def __draw_two_dimension_data(ax, data, pair):
133
        """!
134
        @brief Display data in two-dimensional canvas.
135
136
        @param[in] ax (Axis): Canvas where data should be displayed.
137
        @param[in] data (list): Data points that should be displayed.
138
        @param[in] pair (tuple): Pair of dimension indexes.
139
140
        """
141
        ax.set_xlabel("x%d" % pair[0])
142
        ax.set_ylabel("x%d" % pair[1])
143
144
        for point in data:
145
            if len(data[0]) > 1:
146
                ax.plot(point[pair[0]], point[pair[1]], color='red', marker='.')
147
            else:
148
                ax.plot(point[pair[0]], 0, color='red', marker='.')
149
                ax.yaxis.set_ticklabels([])
150
151
152
    @staticmethod
153
    def __draw_blocks(ax, blocks, pair):
154
        """!
155
        @brief Display BANG-blocks on specified figure.
156
157
        @param[in] ax (Axis): Axis where bang-blocks should be displayed.
158
        @param[in] blocks (list): List of blocks that should be displyed.
159
        @param[in] pair (tuple): Pair of coordinate index that should be displyed.
160
161
        """
162
        ax.grid(False)
163
164
        density_scale = blocks[-1].get_density()
165
        for block in blocks:
166
            bang_visualizer.__draw_block(ax, pair, block, density_scale)
167
168
169 View Code Duplication
    @staticmethod
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
170
    def __draw_block(ax, pair, block, density_scale):
171
        """!
172
        @brief Display BANG-block on the specified ax.
173
174
        @param[in] ax (Axis): Axis where block should be displayed.
175
        @param[in] pair (tuple): Pair of coordinate index that should be displayed.
176
        @param[in] block (bang_block): BANG-block that should be displayed.
177
        @param[in] density_scale (double): Max density to display density of the block by appropriate tone.
178
179
        """
180
        max_corner, min_corner = bang_visualizer.__get_rectangle_description(block, pair)
181
182
        belong_cluster = block.get_cluster() is not None
183
184
        density_scale = bang_visualizer.__maximum_density_alpha * block.get_density() / density_scale
185
186
        face_color = matplotlib.colors.to_rgba('blue', alpha=density_scale)
187
        edge_color = matplotlib.colors.to_rgba('black', alpha=1.0)
188
189
        rect = patches.Rectangle(min_corner, max_corner[0] - min_corner[0], max_corner[1] - min_corner[1],
190
                                 fill=belong_cluster,
191
                                 facecolor=face_color,
192
                                 edgecolor=edge_color,
193
                                 linewidth=0.5)
194
        ax.add_patch(rect)
195
196
197
    @staticmethod
198
    def __get_rectangle_description(block, pair):
199
        """!
200
        @brief Create rectangle description for block in specific dimension.
201
202
        @param[in] pair (tuple): Pair of coordinate index that should be displayed.
203
        @param[in] block (bang_block): BANG-block that should be displayed
204
205
        @return (tuple) Pair of corners that describes rectangle.
206
207
        """
208
        max_corner, min_corner = block.get_spatial_block().get_corners()
209
210
        max_corner = [max_corner[pair[0]], max_corner[pair[1]]]
211
        min_corner = [min_corner[pair[0]], min_corner[pair[1]]]
212
213
        if pair == (0, 0):
214
            max_corner[1], min_corner[1] = 1.0, -1.0
215
216
        return max_corner, min_corner
217
218
219
class bang_animator:
220
    """!
221
    @brief Provides service for creating animation using BANG clustering results.
222
223
    """
224
    def __init__(self, directory, clusters, noise):
0 ignored issues
show
Unused Code introduced by
The argument noise seems to be unused.
Loading history...
225
        """!
226
        @brief Creates BANG animator instance.
227
228
        @param[in] directory (bang_directory): BANG directory that was formed during BANG clustering process.
229
        @param[in] clusters (list): Allocated clusters during BANG clustering process.
230
        @param[in] noise (list): Allocated noise (outliers) during BANG clustering process.
231
232
        """
233
        self.__directory = directory
234
        self.__clusters = clusters
235
        self.__noise = []
236
237
        self.__current_block = 0
238
        self.__current_level = 0
239
        self.__level_blocks = directory.get_level(0)
240
241
        self.__figure = plt.figure()
242
        self.__ax = self.__figure.add_subplot(1, 1, 1)
243
        self.__special_frame = 0
244
245
246
    def __increment_block(self):
247
        self.__current_block += 1
248
        if self.__current_block >= len(self.__level_blocks):
249
            self.__current_block = 0
250
            self.__current_level += 1
251
252
            if self.__current_level < self.__directory.get_height():
253
                self.__level_blocks = self.__directory.get_level(self.__current_level)
254
255
256 View Code Duplication
    def __draw_block(self, block, block_alpha=0.0):
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
257
        max_corner, min_corner = block.get_spatial_block().get_corners()
258
259
        face_color = matplotlib.colors.to_rgba('blue', alpha=block_alpha)
260
        edge_color = matplotlib.colors.to_rgba('black', alpha=1.0)
261
262
        rect = patches.Rectangle(min_corner, max_corner[0] - min_corner[0], max_corner[1] - min_corner[1],
263
                                 fill=True,
264
                                 facecolor=face_color,
265
                                 edgecolor=edge_color,
266
                                 linewidth=0.5)
267
        self.__ax.add_patch(rect)
268
269
270
    def __draw_leaf_density(self):
271
        leafs = self.__directory.get_leafs()
272
        density_scale = leafs[-1].get_density()
273
274
        for block in leafs:
275
            alpha = 0.8 * block.get_density() / density_scale
276
            self.__draw_block(block, alpha)
277
278
279
    def __draw_clusters(self):
280
        data = self.__directory.get_data()
281
        for index_cluster in range(len(self.__clusters)):
282
            color = color_list.get_color(index_cluster)
283
            self.__draw_cluster(data, self.__clusters[index_cluster], color, '.')
284
285
        self.__draw_cluster(self.__directory.get_data(), self.__noise, 'gray', 'x')
286
287
288
    def __draw_cluster(self, data, cluster, color, marker):
289
        for item in cluster:
290
            self.__ax.plot(data[item][0], data[item][1], color=color, marker=marker);
291
292
293
    def animate(self, animation_velocity=75, movie_fps=1, save_movie=None):
294
        """!
295
        @brief Animates clustering process that is performed by BANG algorithm.
296
297
        @param[in] animation_velocity (uint): Interval between frames in milliseconds (for run-time animation only).
298
        @param[in] movie_fps (uint): Defines frames per second (for rendering movie only).
299
        @param[in] save_movie (string): If it is specified then animation will be stored to file that is specified in this parameter.
300
301
        """
302
        def init_frame():
303
            self.__figure.clf()
304
            self.__ax = self.__figure.add_subplot(1, 1, 1)
305
306
            for point in self.__directory.get_data():
307
                self.__ax.plot(point[0], point[1], color='red', marker='.')
308
309
            return frame_generation(0);
310
311
312
        def frame_generation(index_iteration):
313
            if self.__current_level < self.__directory.get_height():
314
                print(index_iteration, "draw block")
315
                block = self.__level_blocks[self.__current_block]
316
                self.__draw_block(block)
317
                self.__increment_block()
318
319
            else:
320
                if self.__special_frame == 0:
321
                    print("draw density")
322
                    self.__draw_leaf_density()
323
324
                elif self.__special_frame == 1:
325
                    print("draw clusters")
326
                    self.__draw_clusters()
327
328
                elif self.__special_frame == 2:
329
                    print("draw only clusters")
330
                    self.__figure.clf()
331
                    self.__ax = self.__figure.add_subplot(1, 1, 1)
332
                    self.__draw_clusters()
333
334
                self.__special_frame += 1
335
336
337
338
        iterations = len(self.__directory) + 2
339
        print("Total number of iterations: %d" % iterations)
340
        cluster_animation = animation.FuncAnimation(self.__figure, frame_generation, iterations,
341
                                                    interval=animation_velocity,
342
                                                    init_func=init_frame,
343
                                                    repeat_delay=5000)
344
345
        if (save_movie is not None):
346
            cluster_animation.save(save_movie, writer = 'ffmpeg', fps = movie_fps, bitrate = 1500)
347
        else:
348
            plt.show()
349
350
351
class bang_directory:
352
    """!
353
    @brief BANG directory stores BANG-blocks that represents grid in data space.
354
    @details The directory build BANG-blocks in binary tree manner. Leafs of the tree stored separately to provide
355
              a direct access to the leafs that should be analysed. Leafs cache data-points.
356
357
    """
358
    def __init__(self, data, levels, density_threshold=0.0, **kwargs):
359
        """!
360
        @brief Create BANG directory - basically tree structure with direct access to leafs.
361
362
        @param[in] data (list): Input data that is clustered.
363
        @param[in] levels (uint): Height of the blocks tree.
364
        @param[in] density_threshold (double): The lowest level of density when contained data by bang-block is
365
                    considered as a noise and there is no need to split it till the last level.
366
        @param[in] **kwargs: Arbitrary keyword arguments (available arguments: 'observe').
367
368
        <b>Keyword Args:</b><br>
369
            - observe (bool): If 'True' then blocks on each level are stored.
370
371
        """
372
        self.__data = data
373
        self.__levels = levels
374
        self.__density_threshold = density_threshold
375
        self.__leafs = []
376
        self.__root = None
377
        self.__level_blocks = []
378
        self.__size = 0
379
        self.__observe = kwargs.get('observe', True)
380
381
        self.__create_directory()
382
383
384
    def __len__(self):
385
        """!
386
        @brief Returns amount of blocks that is stored in the directory
387
388
        @return (uint) Amount of blocks in the BANG directory.
389
390
        """
391
        return self.__size
392
393
394
    def get_data(self):
395
        """!
396
        @brief Return data that is stored in the directory.
397
398
        @return (list) List of points that represents stored data.
399
400
        """
401
        return self.__data
402
403
404
    def get_leafs(self):
405
        """!
406
        @brief Return leafs - the smallest blocks.
407
        @details Some leafs can be bigger than others because splitting is not performed for blocks whose density is
408
                  less than threshold.
409
410
        @return (list) List of blocks that are leafs of BANG directory.
411
412
        """
413
        return self.__leafs
414
415
416
    def get_level(self, level):
417
        """!
418
        @brief Returns BANG blocks on the specific level.
419
420
        @param[in] level (uint): Level of tree where BANG blocks are located.
421
422
        @return (list) List of BANG blocks on the specific level.
423
424
        """
425
        return self.__level_blocks[level]
426
427
428
    def get_height(self):
429
        """!
430
        @brief Returns height of BANG tree where blocks are stored.
431
432
        @return (uint) Height of BANG tree.
433
434
        """
435
        return len(self.__level_blocks)
436
437
438
    def __create_directory(self):
439
        """!
440
        @brief Create BANG directory as a tree with separate storage for leafs.
441
442
        """
443
444
        min_corner, max_corner = data_corners(self.__data)
445
        data_block = spatial_block(max_corner, min_corner)
446
447
        cache_require = (self.__levels == 1)
448
        self.__root = bang_block(self.__data, 0, 0, data_block, cache_require)
449
450
        if cache_require:
451
            self.__leafs.append(self.__root)
452
            self.__store_level_blocks([self.__root])
453
        else:
454
            self.__build_directory_levels()
455
456
457
    def __store_level_blocks(self, level_blocks):
458
        """!
459
        @brief Store level blocks if observing is enabled.
460
461
        @param[in] level_blocks (list): Created blocks on a new level.
462
463
        """
464
        self.__size += len(level_blocks)
465
        if self.__observe is True:
466
            self.__level_blocks.append(level_blocks)
467
468
469
470
    def __build_directory_levels(self):
471
        """!
472
        @brief Build levels of direction if amount of level is greater than one.
473
474
        """
475
476
        previous_level_blocks = [ self.__root ]
477
478
        for level in range(1, self.__levels):
479
            previous_level_blocks = self.__build_level(previous_level_blocks, level)
480
            self.__store_level_blocks(previous_level_blocks)
481
482
        self.__leafs = sorted(self.__leafs, key=lambda block: block.get_density())
483
484
485
    def __build_level(self, previous_level_blocks, level):
486
        """!
487
        @brief Build new level of directory.
488
489
        @param[in] previous_level_blocks (list): BANG-blocks on the previous level.
490
        @param[in] level (uint): Level number that should be built.
491
492
        @return (list) New block on the specified level.
493
494
        """
495
        current_level_blocks = []
496
497
        split_dimension = level % len(self.__data[0])
498
        cache_require = (level == self.__levels - 1)
499
500
        for block in previous_level_blocks:
501
            self.__split_block(block, split_dimension, cache_require, current_level_blocks)
502
503
        if cache_require:
504
            self.__leafs += current_level_blocks
505
506
        return current_level_blocks
507
508
509
    def __split_block(self, block, split_dimension, cache_require, current_level_blocks):
510
        """!
511
        @brief Split specific block in specified dimension.
512
        @details Split is not performed for block whose density is lower than threshold value, such blocks are putted to
513
                  leafs.
514
515
        @param[in] block (bang_block): BANG-block that should be split.
516
        @param[in] split_dimension (uint): Dimension at which splitting should be performed.
517
        @param[in] cache_require (bool): Defines when points in cache should be stored during density calculation.
518
        @param[in|out] current_level_blocks (list): Block storage at the current level where new blocks should be added.
519
520
        """
521
        if block.get_density() <= self.__density_threshold:
522
            self.__leafs.append(block)
523
524
        else:
525
            left, right = block.split(split_dimension, cache_require)
526
            current_level_blocks.append(left)
527
            current_level_blocks.append(right)
528
529
530
531
class spatial_block:
532
    """!
533
    @brief Geometrical description of BANG block in data space.
534
    @details Provides services related to spatial function and used by bang_block
535
536
    @see bang_block
537
538
    """
539
540
    def __init__(self, max_corner, min_corner):
541
        """!
542
        @brief Creates spatial block in data space.
543
544
        @param[in] max_corner (array_like): Maximum corner coordinates of the block.
545
        @param[in] min_corner (array_like): Minimal corner coordinates of the block.
546
547
        """
548
        self.__max_corner = max_corner
549
        self.__min_corner = min_corner
550
        self.__volume = self.__calculate_volume()
551
552
553
    def __str__(self):
554
        """!
555
        @brief Returns string block description.
556
557
        @return String representation of the block.
558
559
        """
560
        return "(max: %s; min: %s)" % (self.__max_corner, self.__min_corner)
561
562
563
    def __contains__(self, point):
564
        """!
565
        @brief Point is considered as contained if it lies in block (belong to it).
566
567
        @return (bool) True if point is in block, otherwise False.
568
569
        """
570
        for i in range(len(point)):
571
            if point[i] < self.__min_corner[i] or point[i] > self.__max_corner[i]:
572
                return False
573
574
        return True
575
576
577
    def get_corners(self):
578
        """!
579
        @brief Return spatial description of current block.
580
581
        @return (tuple) Pair of maximum and minimum corners (max_corner, min_corner).
582
583
        """
584
        return self.__max_corner, self.__min_corner
585
586
587
    def get_volume(self):
588
        """!
589
        @brief Returns volume of current block.
590
        @details Volume block has uncommon mining here: for 1D is length of a line, for 2D is square of rectangle,
591
                  for 3D is volume of 3D figure, and for ND is volume of ND figure.
592
593
        @return (double) Volume of current block.
594
595
        """
596
        return self.__volume
597
598
599
    def split(self, dimension):
600
        """!
601
        @brief Split current block into two spatial blocks in specified dimension.
602
603
        @param[in] dimension (uint): Dimension where current block should be split.
604
605
        @return (tuple) Pair of new split blocks from current block.
606
607
        """
608
        first_max_corner = self.__max_corner[:]
609
        second_min_corner = self.__min_corner[:]
610
611
        split_border = (self.__max_corner[dimension] + self.__min_corner[dimension]) / 2.0
612
613
        first_max_corner[dimension] = split_border
614
        second_min_corner[dimension] = split_border
615
616
        return spatial_block(first_max_corner, self.__min_corner), spatial_block(self.__max_corner, second_min_corner)
617
618
619
    def is_neighbor(self, block):
620
        """!
621
        @brief Performs calculation to identify whether specified block is neighbor of current block.
622
623
        @param[in] block (spatial_block): Another block that is check whether it is neighbor.
624
625
        @return (bool) True is blocks are neighbors, False otherwise.
626
627
        """
628
        if block is not self:
629
            block_max_corner, _ = block.get_corners()
630
            dimension = len(block_max_corner)
631
            neighborhood_score = self.__calculate_neighborhood(block_max_corner)
632
633
            if neighborhood_score == dimension:
634
                return True
635
636
        return False
637
638
639
    def __calculate_neighborhood(self, block_max_corner):
640
        """!
641
        @brief Calculates neighborhood score that defined whether blocks are neighbors.
642
643
        @param[in] block_max_corner (list): Maximum coordinates of other block.
644
645
        @return (uint) Neighborhood score.
646
647
        """
648
        dimension = len(block_max_corner)
649
650
        length_edges = [self.__max_corner[i] - self.__min_corner[i] for i in range(dimension)]
651
652
        neighborhood_score = 0
653
        for i in range(dimension):
654
            diff = abs(block_max_corner[i] - self.__max_corner[i])
655
656
            if diff <= length_edges[i] + length_edges[i] * 0.0001:
657
                neighborhood_score += 1
658
659
        return neighborhood_score
660
661
662
    def __calculate_volume(self):
663
        """!
664
        @brief Calculates volume of current spatial block.
665
666
        @return (double) Volume of current spatial block.
667
668
        """
669
        volume = self.__max_corner[0] - self.__min_corner[0]
670
        for i in range(1, len(self.__max_corner)):
671
            volume *= self.__max_corner[i] - self.__min_corner[i]
672
673
        return volume
674
675
676
677
class bang_block:
678
    """!
679
    @brief BANG-block that represent spatial region in data space.
680
681
    """
682
    def __init__(self, data, region, level, space_block, cache_points=False):
683
        """!
684
        @brief Create BANG-block.
685
686
        @param[in] data (list): List of points that are processed.
687
        @param[in] region (uint): Region number - unique value on a level.
688
        @param[in] level (uint): Level number where block is created.
689
        @param[in] space_block (spatial_block): Spatial block description in data space.
690
        @param[in] cache_points (bool): if True then points are stored in memory (used for leaf blocks).
691
692
        """
693
        self.__data = data
694
        self.__region_number = region
695
        self.__level = level
696
        self.__spatial_block = space_block
697
        self.__cache_points = cache_points
698
699
        self.__cluster = None
700
        self.__points = None
701
        self.__density = self.__calculate_density()
702
703
704
    def __str__(self):
705
        """!
706
        @brief Returns string representation of BANG-block using region number and level where block is located.
707
708
        """
709
        return "(" + str(self.__region_number) + ", " + str(self.__level) + ")"
710
711
712
    def get_region(self):
713
        """!
714
        @brief Returns region number of BANG-block.
715
        @details Region number is unique on among region numbers on a directory level. Pair of region number and level
716
                  is unique for all directory.
717
718
        @return (uint) Region number.
719
720
        """
721
        return self.__region_number
722
723
724
    def get_density(self):
725
        """!
726
        @brief Returns density of the BANG-block.
727
728
        @return (double) BANG-block density.
729
730
        """
731
        return self.__density
732
733
734
    def get_cluster(self):
735
        """!
736
        @brief Return index of cluster to which the BANG-block belongs to.
737
        @details Index of cluster may have None value if the block was not assigned to any cluster.
738
739
        @return (uint) Index of cluster or None if the block does not belong to any cluster.
740
741
        """
742
        return self.__cluster
743
744
745
    def get_spatial_block(self):
746
        """!
747
        @brief Return spatial block - BANG-block description in data space.
748
749
        @return (spatial_block) Spatial block of the BANG-block.
750
751
        """
752
        return self.__spatial_block
753
754
755
    def get_points(self):
756
        """!
757
        @brief Return points that covers by the BANG-block.
758
759
        @return (list) List of point indexes that are covered by the block.
760
761
        """
762
        if self.__points is None:
763
            self.__cache_covered_data()
764
765
        return self.__points
766
767
768
    def set_cluster(self, index):
769
        """!
770
        @brief Assign cluster to the BANG-block by index.
771
772
        @param[in] index (uint): Index cluster that is assigned to BANG-block.
773
774
        """
775
        self.__cluster = index
776
777
778
    def is_neighbor(self, block):
779
        """!
780
        @brief Performs calculation to check whether specified block is neighbor to the current.
781
782
        @param[in] block (bang_block): Other BANG-block that should be checked for neighborhood.
783
784
        @return (bool) True if blocks are neighbors, False if blocks are not neighbors.
785
786
        """
787
        return self.get_spatial_block().is_neighbor(block.get_spatial_block())
788
789
790
    def split(self, split_dimension, cache_points):
791
        """!
792
        @brief Split BANG-block into two new blocks in specified dimension.
793
794
        @param[in] split_dimension (uint): Dimension where block should be split.
795
        @param[in] cache_points (bool): If True then covered points are cached. Used for leaf blocks.
796
797
        @return (tuple) Pair of BANG-block that were formed from the current.
798
799
        """
800
        left_region_number = self.__region_number
801
        right_region_number = self.__region_number + 2 ** self.__level
802
803
        first_spatial_block, second_spatial_block = self.__spatial_block.split(split_dimension)
804
805
        left = bang_block(self.__data, left_region_number, self.__level + 1, first_spatial_block, cache_points)
806
        right = bang_block(self.__data, right_region_number, self.__level + 1, second_spatial_block, cache_points)
807
808
        return left, right
809
810
811
    def __calculate_density(self):
812
        """!
813
        @brief Calculates BANG-block density.
814
815
        @return (double) BANG-block density.
816
817
        """
818
        return self.__get_amount_points() / self.__spatial_block.get_volume()
819
820
821
    def __get_amount_points(self):
822
        """!
823
        @brief Count covered points by the BANG-block and if cache is enable then covered points are stored.
824
825
        @return (uint) Amount of covered points.
826
827
        """
828
        amount = 0
829
        for index in range(len(self.__data)):
830
            if self.__data[index] in self.__spatial_block:
831
                self.__cache_point(index)
832
                amount += 1
833
834
        return amount
835
836
837
    def __cache_covered_data(self):
838
        """!
839
        @brief Cache covered data.
840
841
        """
842
        self.__cache_points = True
843
        self.__points = []
844
845
        for index_point in range(len(self.__data)):
846
            if self.__data[index_point] in self.__spatial_block:
847
                self.__cache_point(index_point)
848
849
850
    def __cache_point(self, index):
851
        """!
852
        @brief Store index points.
853
854
        @param[in] index (uint): Index point that should be stored.
855
856
        """
857
        if self.__cache_points:
858
            if self.__points is None:
859
                self.__points = []
860
861
            self.__points.append(index)
862
863
864
865
class bang:
866
    """!
867
    @brief Class implements BANG grid based clustering algorithm.
868
    @details BANG clustering algorithms uses a multidimensional grid structure to organize the value space surrounding
869
              the pattern values. The patterns are grouped into blocks and clustered with respect to the blocks by
870
              a topological neighbor search algorithm @cite inproceedings::bang::1.
871
872
    Code example of BANG usage:
873
    @code
874
        from pyclustering.cluster.bang import bang, bang_visualizer
875
        from pyclustering.utils import read_sample
876
        from pyclustering.samples.definitions import FCPS_SAMPLES
877
878
        # Read data three dimensional data.
879
        data = read_sample(FCPS_SAMPLES.SAMPLE_CHAINLINK)
880
881
        # Prepare algorithm's parameters.
882
        levels = 11
883
884
        # Create instance of BANG algorithm.
885
        bang_instance = bang(data, levels)
886
        bang_instance.process()
887
888
        # Obtain clustering results.
889
        clusters = bang_instance.get_clusters()
890
        noise = bang_instance.get_noise()
891
        directory = bang_instance.get_directory()
892
        dendrogram = bang_instance.get_dendrogram()
893
894
        # Visualize BANG clustering results.
895
        bang_visualizer.show_blocks(directory)
896
        bang_visualizer.show_dendrogram(dendrogram)
897
        bang_visualizer.show_clusters(data, clusters, noise)
898
    @endcode
899
900
    There is visualization of BANG-clustering of three-dimensional data 'chainlink'. BANG-blocks that were formed during
901
    processing are shown on following figure. The darkest color means highest density, blocks that does not cover points
902
    are transparent:
903
    @image html bang_blocks_chainlink.png "Fig. 1. BANG-blocks that cover input data."
904
905
    Here is obtained dendrogram that can be used for further analysis to improve clustering results:
906
    @image html bang_dendrogram_chainlink.png "Fig. 2. BANG dendrogram where the X-axis contains BANG-blocks, the Y-axis contains density."
907
908
    BANG clustering result of 'chainlink' data:
909
    @image html bang_clustering_chainlink.png "Fig. 3. BANG clustering result. Data: 'chainlink'."
910
911
    """
912
913
    def __init__(self, data, levels, density_threshold=0.0, ccore=False):
914
        """!
915
        @brief Create BANG clustering algorithm.
916
917
        @param[in] data (list): Input data (list of points) that should be clustered.
918
        @param[in] levels (uint): Amount of levels in tree (how many times block should be split).
919
        @param[in] density_threshold (double): If block density is smaller than this value then contained data by this
920
                    block is considered as a noise.
921
        @param[in] ccore (bool): Reserved positional argument - not used yet.
922
923
        """
924
        self.__data = data
925
        self.__levels = levels
926
        self.__directory = None
927
        self.__clusters = []
928
        self.__noise = []
929
        self.__cluster_blocks = []
930
        self.__dendrogram = []
931
        self.__density_threshold = density_threshold
932
        self.__ccore = ccore
933
934
        self.__validate_arguments()
935
936
937
    def process(self):
938
        """!
939
        @brief Performs clustering process in line with rules of BANG clustering algorithm.
940
941
        @see get_clusters()
942
        @see get_noise()
943
        @see get_directory()
944
        @see get_dendrogram()
945
946
        """
947
        self.__directory = bang_directory(self.__data, self.__levels, self.__density_threshold)
948
        self.__allocate_clusters()
949
950
951
    def get_clusters(self):
952
        """!
953
        @brief Returns allocated clusters.
954
955
        @remark Allocated clusters are returned only after data processing (method process()). Otherwise empty list is returned.
956
957
        @return (list) List of allocated clusters, each cluster contains indexes of objects in list of data.
958
959
        @see process()
960
        @see get_noise()
961
962
        """
963
        return self.__clusters
964
965
966
    def get_noise(self):
967
        """!
968
        @brief Returns allocated noise.
969
970
        @remark Allocated noise is returned only after data processing (method process()). Otherwise empty list is returned.
971
972
        @return (list) List of indexes that are marked as a noise.
973
974
        @see process()
975
        @see get_clusters()
976
977
        """
978
        return self.__noise
979
980
981
    def get_directory(self):
982
        """!
983
        @brief Returns grid directory that describes grid of the processed data.
984
985
        @remark Grid directory is returned only after data processing (method process()). Otherwise None value is returned.
986
987
        @return (bang_directory) BANG directory that describes grid of process data.
988
989
        @see process()
990
991
        """
992
        return self.__directory
993
994
995
    def get_dendrogram(self):
996
        """!
997
        @brief Returns dendrogram of clusters.
998
        @details Dendrogram is created in following way: the density indices of all regions are calculated and sorted
999
                  in decreasing order for each cluster during clustering process.
1000
1001
        @remark Dendrogram is returned only after data processing (method process()). Otherwise empty list is returned.
1002
1003
        """
1004
        return self.__dendrogram
1005
1006
1007
    def get_cluster_encoding(self):
1008
        """!
1009
        @brief Returns clustering result representation type that indicate how clusters are encoded.
1010
1011
        @return (type_encoding) Clustering result representation.
1012
1013
        @see get_clusters()
1014
1015
        """
1016
1017
        return type_encoding.CLUSTER_INDEX_LIST_SEPARATION
1018
1019
1020
    def __validate_arguments(self):
1021
        """!
1022
        @brief Check input arguments of BANG algorithm and if one of them is not correct then appropriate exception
1023
                is thrown.
1024
1025
        """
1026
        if self.__levels <= 0:
1027
            raise ValueError("Incorrect amount of levels '%d'. Level value should be greater than 0." % self.__levels)
1028
1029
        if len(self.__data) == 0:
1030
            raise ValueError("Empty input data. Data should contain at least one point.")
1031
1032
        if self.__density_threshold < 0:
1033
            raise ValueError("Incorrect density threshold '%f'. Density threshold should not be negative." % self.__density_threshold)
1034
1035
1036
    def __allocate_clusters(self):
1037
        """!
1038
        @brief Performs cluster allocation using leafs of tree in BANG directory (the smallest cells).
1039
1040
        """
1041
        leaf_blocks = self.__directory.get_leafs()
1042
        unhandled_block_indexes = set([i for i in range(len(leaf_blocks)) if leaf_blocks[i].get_density() > self.__density_threshold])
1043
1044
        current_block = self.__find_block_center(leaf_blocks, unhandled_block_indexes)
1045
        cluster_index = 0
1046
1047
        while current_block is not None:
1048
            if current_block.get_density() <= self.__density_threshold:
1049
                break
1050
1051
            self.__expand_cluster_block(current_block, cluster_index, leaf_blocks, unhandled_block_indexes)
1052
1053
            current_block = self.__find_block_center(leaf_blocks, unhandled_block_indexes)
1054
            cluster_index += 1
1055
1056
        self.__store_clustering_results(cluster_index, leaf_blocks)
1057
1058
1059
    def __expand_cluster_block(self, block, cluster_index, leaf_blocks, unhandled_block_indexes):
1060
        """!
1061
        @brief Expand cluster from specific block that is considered as a central block.
1062
1063
        @param[in] block (bang_block): Block that is considered as a central block for cluster.
1064
        @param[in] cluster_index (uint): Index of cluster that is assigned to blocks that forms new cluster.
1065
        @param[in] leaf_blocks (list): Leaf BANG-blocks that are considered during cluster formation.
1066
        @param[in] unhandled_block_indexes (set): Set of candidates (BANG block indexes) to become a cluster member. The
1067
                    parameter helps to reduce traversing among BANG-block providing only restricted set of block that
1068
                    should be considered.
1069
1070
        """
1071
1072
        block.set_cluster(cluster_index)
1073
        self.__update_cluster_dendrogram(cluster_index, [block])
1074
1075
        neighbors = self.__find_block_neighbors(block, leaf_blocks, unhandled_block_indexes)
1076
        self.__update_cluster_dendrogram(cluster_index, neighbors)
1077
1078
        for neighbor in neighbors:
1079
            neighbor.set_cluster(cluster_index)
1080
            neighbor_neighbors = self.__find_block_neighbors(neighbor, leaf_blocks, unhandled_block_indexes)
1081
            self.__update_cluster_dendrogram(cluster_index, neighbor_neighbors)
1082
1083
            neighbors += neighbor_neighbors
1084
1085
1086
    def __store_clustering_results(self, amount_clusters, leaf_blocks):
1087
        """!
1088
        @brief Stores clustering results in a convenient way.
1089
1090
        @param[in] amount_clusters (uint): Amount of cluster that was allocated during processing.
1091
        @param[in] leaf_blocks (list): Leaf BANG-blocks (the smallest cells).
1092
1093
        """
1094
        self.__clusters = [[] for _ in range(amount_clusters)]
1095
        for block in leaf_blocks:
1096
            index = block.get_cluster()
1097
1098
            if index is not None:
1099
                self.__clusters[index] += block.get_points()
1100
            else:
1101
                self.__noise += block.get_points()
1102
1103
        self.__clusters = [ list(set(cluster)) for cluster in self.__clusters ]
1104
        self.__noise = list(set(self.__noise))
1105
1106
1107
    def __find_block_center(self, level_blocks, unhandled_block_indexes):
1108
        """!
1109
        @brief Search block that is cluster center for new cluster.
1110
1111
        @return (bang_block) Central block for new cluster, if cluster is not found then None value is returned.
1112
1113
        """
1114
        for i in reversed(range(len(level_blocks))):
1115
            if level_blocks[i].get_density() <= self.__density_threshold:
1116
                return None
1117
1118
            if level_blocks[i].get_cluster() is None:
1119
                unhandled_block_indexes.remove(i)
1120
                return level_blocks[i]
1121
1122
        return None
1123
1124
1125
    def __find_block_neighbors(self, block, level_blocks, unhandled_block_indexes):
1126
        """!
1127
        @brief Search block neighbors that are parts of new clusters (density is greater than threshold and that are
1128
                not cluster members yet), other neighbors are ignored.
1129
1130
        @param[in] block (bang_block): BANG-block for which neighbors should be found (which can be part of cluster).
1131
        @param[in] level_blocks (list): BANG-blocks on specific level.
1132
        @param[in] unhandled_block_indexes (set): Blocks that have not been processed yet.
1133
1134
        @return (list) Block neighbors that can become part of cluster.
1135
1136
        """
1137
        neighbors = []
1138
1139
        handled_block_indexes = []
1140
        for unhandled_index in unhandled_block_indexes:
1141
            if block.is_neighbor(level_blocks[unhandled_index]):
1142
                handled_block_indexes.append(unhandled_index)
1143
                neighbors.append(level_blocks[unhandled_index])
1144
1145
                # Maximum number of neighbors is eight
1146
                if len(neighbors) == 8:
1147
                    break
1148
1149
        for handled_index in handled_block_indexes:
1150
            unhandled_block_indexes.remove(handled_index)
1151
1152
        return neighbors
1153
1154
1155
    def __update_cluster_dendrogram(self, index_cluster, blocks):
1156
        """!
1157
        @brief Append clustered blocks to dendrogram.
1158
1159
        @param[in] index_cluster (uint): Cluster index that was assigned to blocks.
1160
        @param[in] blocks (list): Blocks that were clustered.
1161
1162
        """
1163
        if len(self.__dendrogram) <= index_cluster:
1164
            self.__dendrogram.append([])
1165
1166
        blocks = sorted(blocks, key=lambda block: block.get_density(), reverse=True)
1167
        self.__dendrogram[index_cluster] += blocks
1168
1169