Completed
Push — master ( c3e4f5...e161a0 )
by Andrei
01:28
created

optics   B

Complexity

Total Complexity 46

Size/Duplication

Total Lines 351
Duplicated Lines 0 %

Importance

Changes 0
Metric Value
dl 0
loc 351
rs 8.3999
c 0
b 0
f 0
wmc 46

13 Methods

Rating   Name   Duplication   Size   Complexity  
A get_clusters() 0 14 1
A get_radius() 0 14 1
A __neighbor_indexes() 0 21 4
B process() 0 24 5
B __expand_cluster_order() 0 45 6
A get_cluster_encoding() 0 11 1
D __extract_clusters() 0 25 8
A __allocate_clusters() 0 13 3
B __init__() 0 24 1
B get_ordering() 0 24 5
A __initialize() 0 12 2
D __update_order_seed() 0 32 8
A get_noise() 0 14 1

How to fix   Complexity   

Complex Class

Complex classes like optics often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

1
"""!
2
3
@brief Cluster analysis algorithm: OPTICS (Ordering Points To Identify Clustering Structure)
4
@details Based on article description:
5
         - M.Ankerst, M.Breunig, H.Kriegel, J.Sander. OPTICS: Ordering Points To Identify the Clustering Structure. 1999.
6
7
@authors Andrei Novikov ([email protected])
8
@date 2014-2017
9
@copyright GNU Public License
10
11
@cond GNU_PUBLIC_LICENSE
12
    PyClustering is free software: you can redistribute it and/or modify
13
    it under the terms of the GNU General Public License as published by
14
    the Free Software Foundation, either version 3 of the License, or
15
    (at your option) any later version.
16
    
17
    PyClustering is distributed in the hope that it will be useful,
18
    but WITHOUT ANY WARRANTY; without even the implied warranty of
19
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
20
    GNU General Public License for more details.
21
    
22
    You should have received a copy of the GNU General Public License
23
    along with this program.  If not, see <http://www.gnu.org/licenses/>.
24
@endcond
25
26
"""
27
28
29
from pyclustering.cluster.encoder import type_encoding;
30
31
from pyclustering.utils import euclidean_distance;
32
33
import matplotlib.pyplot as plt;
0 ignored issues
show
Configuration introduced by
The import matplotlib.pyplot could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
34
35
import pyclustering.core.optics_wrapper as wrapper;
0 ignored issues
show
Bug introduced by
The name optics_wrapper does not seem to exist in module pyclustering.core.
Loading history...
Configuration introduced by
The import pyclustering.core.optics_wrapper could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
36
37
38
class ordering_visualizer:
39
    """!
40
    @brief Cluster ordering diagram visualizer that represents dataset graphically.
41
    
42
    """
43
    
44
    @staticmethod
45
    def show_ordering_diagram(analyser):
46
        ordering = analyser.cluster_ordering;
47
        indexes = [i for i in range(0, len(ordering))];
48
        
49
        plt.bar(indexes, ordering);
50
        plt.show();
51
52
53
class ordering_analyser:
54
    """!
55
    @brief Analyser of cluster ordering diagram.
56
    
57
    """
58
    
59
    @property
60
    def cluster_ordering(self):
61
        """!
62
        @brief (list) Returns values of dataset cluster ordering.
63
        
64
        """
65
        return self.__ordering;
66
    
67
    
68
    def __init__(self, ordering_diagram):
69
        """!
70
        @brief Analyser of ordering diagram that is based on reachability-distances.
71
        
72
        @see calculate_connvectivity_radius
73
        
74
        """
75
        self.__ordering = ordering_diagram;
76
    
77
    
78
    def __len__(self):
79
        """!
80
        @brief Returns length of clustering-ordering diagram.
81
        
82
        """
83
        return len(self.__ordering);
84
    
85
    
86
    def calculate_connvectivity_radius(self, amount_clusters):
87
        """!
88
        @brief Calculates connectivity radius of allocation specified amount of clusters using ordering diagram.
89
        
90
        @return (double) Value of connectivity radius.
91
        
92
        """
93
        
94
        maximum_distance = max(self.__ordering);
95
        
96
        upper_distance = maximum_distance;
97
        lower_distance = 0.0;
98
        
99
        radius = None;
100
        
101
        if (self.extract_cluster_amount(maximum_distance) <= amount_clusters):
102
            while(True):
103
                radius = (lower_distance + upper_distance) / 2.0;
104
                
105
                amount = self.extract_cluster_amount(radius);
106
                if (amount == amount_clusters):
107
                    break;
108
                
109
                elif (amount > amount_clusters):
110
                    lower_distance = radius;
111
                
112
                elif (amount < amount_clusters):
113
                    upper_distance = radius;
114
        
115
        return radius;
116
    
117
    
118
    def extract_cluster_amount(self, radius):
119
        """!
120
        @brief Obtains amount of clustering that can be allocated by using specified radius for ordering diagram.
121
        @details When growth of reachability-distances is detected than it is considered as a start point of cluster, 
122
                 than pick is detected and after that recession is observed until new growth (that means end of the
123
                 current cluster and start of a new one) or end of diagram.
124
        
125
        @param[in] ordering (list): list of reachability-distances of optics objects.
126
        @param[in] radius (double): connectivity radius that is used for cluster allocation.
127
        
128
        
129
        @return (unit) Amount of clusters that can be allocated by the connectivity radius on ordering diagram.
130
        
131
        """
132
        
133
        amount_clusters = 1;
134
        
135
        cluster_start = False;
136
        cluster_pick = False;
137
        previous_distance = 0.0;
138
        
139
        for distance in self.__ordering:
140
            if (distance >= radius):
141
                if (cluster_start is False):
142
                    cluster_start = True;
143
                    amount_clusters += 1;
144
                
145
                else:
146
                    if ((distance < previous_distance) and (cluster_pick is False)):
147
                        cluster_pick = True;
148
                    
149
                    elif ((distance > previous_distance) and (cluster_pick is True)):
150
                        cluster_pick = False;
151
                        amount_clusters += 1;
152
                
153
                previous_distance = distance;
154
            
155
            else:
156
                cluster_start = False;
157
                cluster_pick = False;
158
        
159
        return amount_clusters;
160
161
162
class optics_descriptor:
163
    """!
164
    @brief Object description that used by OPTICS algorithm for cluster analysis.
165
    
166
    """
167
168
    def __init__(self, index, core_distance = None, reachability_distance = None):
169
        """!
170
        @brief Constructor of object description in optics terms.
171
        
172
        @param[in] index (uint): Index of the object in the data set.
173
        @param[in] core_distance (double): Core distance that is minimum distance to specified number of neighbors.
174
        @param[in] reachability_distance (double): Reachability distance to this object.
175
        
176
        """
177
        
178
        ## Reachability distance - the smallest distance to be reachable by core object.
179
        self.index_object = index;
180
        
181
        ## Core distance - the smallest distance to reach specified number of neighbors that is not greater then connectivity radius.
182
        self.core_distance = core_distance;
183
        
184
        ## Index of object from the input data.
185
        self.reachability_distance = reachability_distance;
186
        
187
        ## True is object has been already traversed.
188
        self.processed = False;
189
190
    def __repr__(self):
191
        """!
192
        @brief Returns string representation of the optics descriptor.
193
        
194
        """
195
        
196
        return '(%s, [c: %s, r: %s])' % (self.index_object, self.core_distance, self.reachability_distance);
197
198
199
class optics:
200
    """!
201
    @brief Class represents clustering algorithm OPTICS (Ordering Points To Identify Clustering Structure).
202
    @details OPTICS is a density-based algorithm. Purpose of the algorithm is to provide explicit clusters, but create clustering-ordering representation of the input data. 
203
             Clustering-ordering information contains information about internal structures of data set in terms of density and proper connectivity radius can be obtained
204
             for allocation required amount of clusters using this diagram. In case of usage additional input parameter 'amount of clusters' connectivity radius should be
205
             bigger than real - because it will be calculated by the algorithms.
206
207
    Example:
208
    @code
209
        # Read sample for clustering from some file
210
        sample = read_sample(path_sample);
211
        
212
        # Create OPTICS algorithm for cluster analysis
213
        optics_instance = optics(sample, 0.5, 6);
214
        
215
        # Run cluster analysis
216
        optics_instance.process();
217
        
218
        # Obtain results of clustering
219
        clusters = optics_instance.get_clusters();
220
        noise = optics_instance.get_noise();
221
        
222
        # Obtain rechability-distances
223
        ordering = ordering_analyser(optics_instance.get_ordering());
224
        
225
        # Visualization of cluster ordering in line with reachability distance.
226
        ordering_visualizer.show_ordering_diagram(ordering);
227
    @endcode
228
    
229
    Amount of clusters that should be allocated can be also specified. In this case connectivity radius should be greater than real, for example:
230
    @code
231
        # Import required packages
232
        from pyclustering.cluster.optics import optics;
233
        from pyclustering.samples.definitions import FCPS_SAMPLES;
234
        from pyclustering.utils import read_sample;
235
        
236
        # Read sample for clustering from some file
237
        sample = read_sample(FCPS_SAMPLES.SAMPLE_LSUN);
238
        
239
        # Run cluster analysis where connvectivity radius is bigger than real
240
        radius = 2.0;
241
        neighbors = 3;
242
        amount_of_clusters = 3;
243
        
244
        optics_instance = optics(sample, radius, neighbors, amount_of_clusters);
245
        
246
        # Obtain results of clustering
247
        clusters = optics_instance.get_clusters();
248
        noise = optics_instance.get_noise();
249
    @endcode
250
    
251
    """
252
    
253
    def __init__(self, sample, eps, minpts, amount_clusters = None, ccore = False):
254
        """!
255
        @brief Constructor of clustering algorithm OPTICS.
256
        
257
        @param[in] sample (list): Input data that is presented as a list of points (objects), where each point is represented by list or tuple.
258
        @param[in] eps (double): Connectivity radius between points, points may be connected if distance between them less than the radius.
259
        @param[in] minpts (uint): Minimum number of shared neighbors that is required for establishing links between points.
260
        @param[in] amount_clusters (uint): Optional parameter where amount of clusters that should be allocated is specified.
261
                    In case of usage 'amount_clusters' connectivity radius can be greater than real, in other words, there is place for mistake
262
                    in connectivity radius usage.
263
        @param[in] ccore (bool): if True than DLL CCORE (C++ solution) will be used for solving the problem.
264
        
265
        """
266
        
267
        self.__sample_pointer = sample;     # Algorithm parameter - pointer to sample for processing.
268
        self.__eps = eps;                   # Algorithm parameter - connectivity radius between object for establish links between object.
269
        self.__minpts = minpts;             # Algorithm parameter - minimum number of neighbors that is required for establish links between object.
270
        self.__amount_clusters = amount_clusters;
271
        
272
        self.__ordering = None;
273
        self.__clusters = None;
274
        self.__noise = None;
275
        
276
        self.__ccore = ccore;
277
278
279
    def process(self):
280
        """!
281
        @brief Performs cluster analysis in line with rules of OPTICS algorithm.
282
        
283
        @remark Results of clustering can be obtained using corresponding gets methods.
284
        
285
        @see get_clusters()
286
        @see get_noise()
287
        @see get_ordering()
288
        
289
        """
290
        
291
        if (self.__ccore is True):
292
            (self.__clusters, self.__noise, self.__ordering, self.__eps) = wrapper.optics(self.__sample_pointer, self.__eps, self.__minpts, self.__amount_clusters);
293
        
294
        else:
295
            self.__allocate_clusters();
296
            
297
            if ( (self.__amount_clusters is not None) and (self.__amount_clusters != len(self.get_clusters())) ):
298
                analyser = ordering_analyser(self.get_ordering());
299
                radius = analyser.calculate_connvectivity_radius(self.__amount_clusters);
300
                if (radius is not None):
301
                    self.__eps = radius;
302
                    self.__allocate_clusters();
303
304
305
    def __initialize(self, sample):
306
        """!
307
        @brief Initializes internal states and resets clustering results in line with input sample.
308
        
309
        """
310
        
311
        self.__processed = [False] * len(sample);
312
        self.__optics_objects = [optics_descriptor(i) for i in range(len(sample))];     # List of OPTICS objects that corresponds to objects from input sample.
313
        self.__ordered_database = [];       # List of OPTICS objects in traverse order. 
314
        
315
        self.__clusters = None;     # Result of clustering (list of clusters where each cluster contains indexes of objects from input data).
316
        self.__noise = None;        # Result of clustering (noise).
317
318
319
    def __allocate_clusters(self):
320
        """!
321
        @brief Performs cluster allocation and builds ordering diagram that is based on reachability-distances.
322
        
323
        """
324
        
325
        self.__initialize(self.__sample_pointer);
326
        
327
        for optic_object in self.__optics_objects:
328
            if (optic_object.processed is False):
329
                self.__expand_cluster_order(optic_object);
330
        
331
        self.__extract_clusters();
332
    
333
    
334
    def get_clusters(self):
335
        """!
336
        @brief Returns list of allocated clusters, where each cluster contains indexes of objects and each cluster is represented by list.
337
        
338
        @return (list) List of allocated clusters.
339
        
340
        @see process()
341
        @see get_noise()
342
        @see get_ordering()
343
        @see get_radius()
344
        
345
        """
346
        
347
        return self.__clusters;
348
    
349
    
350
    def get_noise(self):
351
        """!
352
        @brief Returns list of noise that contains indexes of objects that corresponds to input data.
353
        
354
        @return (list) List of allocated noise objects.
355
        
356
        @see process()
357
        @see get_clusters()
358
        @see get_ordering()
359
        @see get_radius()
360
        
361
        """
362
        
363
        return self.__noise;
364
    
365
    
366
    def get_ordering(self):
367
        """!
368
        @brief Returns clustering ordering information about the input data set.
369
        @details Clustering ordering of data-set contains the information about the internal clustering structure in line with connectivity radius.
370
        
371
        @return (ordering_analyser) Analyser of clustering ordering.
372
        
373
        @see process()
374
        @see get_clusters()
375
        @see get_noise()
376
        @see get_radius()
377
        
378
        """
379
        
380
        if (self.__ordering is None):
381
            self.__ordering = [];
382
        
383
            for cluster in self.__clusters:
384
                for index_object in cluster:
385
                    optics_object = self.__optics_objects[index_object];
386
                    if (optics_object.reachability_distance is not None):
387
                        self.__ordering.append(optics_object.reachability_distance);
388
            
389
        return self.__ordering;
390
    
391
    
392
    def get_radius(self):
393
        """!
394
        @brief Returns connectivity radius that is calculated and used for clustering by the algorithm.
395
        @details Connectivity radius may be changed only in case of usage additional parameter of the algorithm - amount of clusters for allocation.
396
        
397
        @return (double) Connectivity radius.
398
        
399
        @see get_ordering()
400
        @see get_clusters()
401
        @see get_noise()
402
        
403
        """
404
        
405
        return self.__eps;
406
    
407
408
    def get_cluster_encoding(self):
409
        """!
410
        @brief Returns clustering result representation type that indicate how clusters are encoded.
411
        
412
        @return (type_encoding) Clustering result representation.
413
        
414
        @see get_clusters()
415
        
416
        """
417
        
418
        return type_encoding.CLUSTER_INDEX_LIST_SEPARATION;
419
420
421
    def __expand_cluster_order(self, optics_object):
422
        """!
423
        @brief Expand cluster order from not processed optic-object that corresponds to object from input data.
424
               Traverse procedure is performed until objects are reachable from core-objects in line with connectivity radius.
425
               Order database is updated during expanding.
426
               
427
        @param[in] optics_object (optics_descriptor): Object that hasn't been processed.
428
        
429
        """
430
        
431
        optics_object.processed = True;
432
        
433
        neighbors_descriptor = self.__neighbor_indexes(optics_object);
434
        optics_object.reachability_distance = None;
435
        
436
        self.__ordered_database.append(optics_object);
437
        
438
        # Check core distance
439
        if (len(neighbors_descriptor) >= self.__minpts):
440
            neighbors_descriptor.sort(key = lambda obj: obj[1]);
441
            optics_object.core_distance = neighbors_descriptor[self.__minpts - 1][1];
442
            
443
            # Continue processing
444
            order_seed = list();
445
            self.__update_order_seed(optics_object, neighbors_descriptor, order_seed);
446
            
447
            while(len(order_seed) > 0):
448
                optic_descriptor = order_seed[0];
449
                order_seed.remove(optic_descriptor);
450
                
451
                neighbors_descriptor = self.__neighbor_indexes(optic_descriptor);
452
                optic_descriptor.processed = True;
453
                
454
                self.__ordered_database.append(optic_descriptor);
455
                
456
                if (len(neighbors_descriptor) >= self.__minpts):
457
                    neighbors_descriptor.sort(key = lambda obj: obj[1]);
458
                    optic_descriptor.core_distance = neighbors_descriptor[self.__minpts - 1][1];
459
                    
460
                    self.__update_order_seed(optic_descriptor, neighbors_descriptor, order_seed);
461
                else:
462
                    optic_descriptor.core_distance = None;
463
                    
464
        else:
465
            optics_object.core_distance = None;
466
467
    
468
    def __extract_clusters(self):
469
        """!
470
        @brief Extract clusters and noise from order database.
471
        
472
        """
473
     
474
        self.__clusters = [];
475
        self.__noise = [];
476
477
        current_cluster = [];
478
        for optics_object in self.__ordered_database:
479
            if ((optics_object.reachability_distance is None) or (optics_object.reachability_distance > self.__eps)):
480
                if ((optics_object.core_distance is not None) and (optics_object.core_distance <= self.__eps)):
481
                    if (len(current_cluster) > 0):
482
                        self.__clusters.append(current_cluster);
483
                        current_cluster = [];
484
                        
485
                    current_cluster.append(optics_object.index_object);
486
                else:
487
                    self.__noise.append(optics_object.index_object);
488
            else:
489
                current_cluster.append(optics_object.index_object);
490
        
491
        if (len(current_cluster) > 0):
492
            self.__clusters.append(current_cluster);
493
                
494
495
    def __update_order_seed(self, optic_descriptor, neighbors_descriptors, order_seed):
496
        """!
497
        @brief Update sorted list of reachable objects (from core-object) that should be processed using neighbors of core-object.
498
        
499
        @param[in] optic_descriptor (optics_descriptor): Core-object whose neighbors should be analysed.
500
        @param[in] neighbors_descriptors (list): List of neighbors of core-object.
501
        @param[in|out] order_seed (list): List of sorted object in line with reachable distance.
502
        
503
        """
504
        
505
        for neighbor_descriptor in neighbors_descriptors:
506
            index_neighbor = neighbor_descriptor[0];
507
            current_reachable_distance = neighbor_descriptor[1];
508
            
509
            if (self.__optics_objects[index_neighbor].processed != True):
510
                reachable_distance = max(current_reachable_distance, optic_descriptor.core_distance);
511
                if (self.__optics_objects[index_neighbor].reachability_distance is None):
512
                    self.__optics_objects[index_neighbor].reachability_distance = reachable_distance;
513
                    
514
                    # insert element in queue O(n) - worst case.
515
                    index_insertion = len(order_seed);
516
                    for index_seed in range(0, len(order_seed)):
517
                        if (reachable_distance < order_seed[index_seed].reachability_distance):
518
                            index_insertion = index_seed;
519
                            break;
520
                    
521
                    order_seed.insert(index_insertion, self.__optics_objects[index_neighbor]);
522
523
                else:
524
                    if (reachable_distance < self.__optics_objects[index_neighbor].reachability_distance):
525
                        self.__optics_objects[index_neighbor].reachability_distance = reachable_distance;
526
                        order_seed.sort(key = lambda obj: obj.reachability_distance);
527
528
529
    def __neighbor_indexes(self, optic_object):
530
        """!
531
        @brief Return list of indexes of neighbors of specified point for the data.
532
        
533
        @param[in] optic_object (optics_descriptor): Object for which neighbors should be returned in line with connectivity radius.
534
        
535
        @return (list) List of indexes of neighbors in line the connectivity radius.
536
        
537
        """
538
              
539
        neighbor_description = [];
540
        
541
        for index in range(0, len(self.__sample_pointer), 1):
542
            if (index == optic_object.index_object):
543
                continue;
544
            
545
            distance = euclidean_distance(self.__sample_pointer[optic_object.index_object], self.__sample_pointer[index]);
546
            if (distance <= self.__eps):
547
                neighbor_description.append( [index, distance] );
548
            
549
        return neighbor_description;