pyclustering.cluster.cure - Code Metrics - Inspection of "[refactoring] ant_colony params [phase 2 - final]" - annoviko/pyclustering - Measure and Improve Code Quality continuously with Scrutinizer

Completed

Push — master ( a63a82...61f52c )

unknown

created 2016-04-12 15:26 UTC

pyclustering.cluster.cure C

↳ Parent: Project

Complexity

Total Complexity

Size/Duplication

Total Lines	348
Duplicated Lines	0 %

Metric	Value
dl	0
loc	348
rs	6.8539
wmc	54

12 Methods

Rating	Name	Size	Complexity
A	cure.__init__()	22	2
A	cure.__delete_represented_points()	10	2
C	cure.__merge_clusters()	54	10
F	cure.process()	78	13
A	cure.__create_kdtree()	12	3
B	cure.__closest_cluster()	23	6
A	cure.__cluster_distance()	20	4
C	cure.__create_queue()	29	7
A	cure.__insert_represented_points()	10	2
A	cure.__relocate_cluster()	10	1
A	cure.__insert_cluster()	14	3
A	cure.get_clusters()	11	1

How to fix Complexity

"""!

@brief Cluster analysis algorithm: CURE
@details Implementation based on article:
         - S.Guha, R.Rastogi, K.Shim. CURE: An Efficient Clustering Algorithm for Large Databases. 1998.

@authors Andrei Novikov ([email protected])
@date 2014-2016
@copyright GNU Public License

@cond GNU_PUBLIC_LICENSE
    PyClustering is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.
    
    PyClustering is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.
    
    You should have received a copy of the GNU General Public License
    along with this program.  If not, see <http://www.gnu.org/licenses/>.
@endcond

"""

from pyclustering.utils import euclidean_distance;
from pyclustering.utils import euclidean_distance_sqrt;


from pyclustering.container.kdtree import kdtree;

import pyclustering.core.wrapper as wrapper;

class cure_cluster:   
    """!
    @brief Represents data cluster in CURE term. CURE cluster is described by points of cluster, representation points of the cluster and by the cluster center.
    
    """
    
    ## List of points that make up cluster.
    points = None;
    
    ## Mean of points that make up cluster.
    mean = None;
    
    ## List of points that represents clusters.
    rep = None;
    
    ## Pointer to the closest cluster.
    closest = None;
    
    ## Distance to the closest cluster.
    distance = None;
    
    def __init__(self, point = None):
        """!
        @brief Constructor of CURE cluster.
        
        @param[in] point (list): Point represented by list of coordinates.
        
        """
        if (point is not None):
            self.points = [ point ];
            self.mean = point;
            self.rep = [ point ];
        else:
            self.points = [ ];
            self.mean = None;
            self.rep = [ ];
            
        self.closest = None;
        self.distance = float('inf');      # calculation of distance is really complexity operation (even square distance), so let's store distance to closest cluster.
        
    def __repr__(self):
        """!
        @brief Displays distance to closest cluster and points that are contained by current cluster.
        
        """
        return "%s, %s" % (self.distance, self.points);
        

class cure:
    """!
    @brief Class represents clustering algorithm CURE.
    
    Example:
    @code
        # read data for clustering from some file
        sample = read_sample(path_to_data);
        
        # create instance of cure algorithm for cluster analysis
        # request for allocation of two clusters.
        cure_instance = cure(sample, 2, 5, 0.5, True);
        
        # run cluster analysis
        cure_instance.process();
        
        # get results of clustering
        clusters = cure_instance.get_clusters();
    @endcode
    
    """
    __pointer_data = None;
    __clusters = None;
    
    __number_cluster = 0;
    __number_represent_points = 0;
    __compression = 0;
    
    __queue = None;
    __tree = None;
    
    __ccore = False;
    
    def __init__(self, data, number_cluster, number_represent_points = 5, compression = 0.5, ccore = False):
        """!
        @brief Constructor of clustering algorithm CURE.
        
        @param[in] data (list): Input data that is presented as list of points (objects), each point should be represented by list or tuple.
        @param[in] number_cluster (uint): Number of clusters that should be allocated.
        @param[in] number_represent_points (uint): Number of representative points for each cluster.
        @param[in] compression (double): Coefficient defines level of shrinking of representation points toward the mean of the new created cluster after merging on each step. Usually it destributed from 0 to 1.
        @param[in] ccore (bool): If True than DLL CCORE (C++ solution) will be used for solving.
        
        """
                
        self.__pointer_data = data;
        self.__number_cluster = number_cluster;
        self.__number_represent_points = number_represent_points;
        self.__compression = compression;
        
        self.__ccore = ccore;
        
        if (ccore is False):
            self.__create_queue();      # queue
            self.__create_kdtree();     # create k-d tree

    
    def process(self):
        """!
        @brief Performs cluster analysis in line with rules of CURE algorithm.
        
        @remark Results of clustering can be obtained using corresponding get methods.
        
        @see get_clusters()
        
        """
        
        if (self.__ccore is True):
            self.__clusters = wrapper.cure(self.__pointer_data, self.__number_cluster, self.__number_represent_points, self.__compression);
        else:

            while (len(self.__queue) > self.__number_cluster):
                cluster1 = self.__queue[0];            # cluster that has nearest neighbor.
                cluster2 = cluster1.closest;    # closest cluster.
                
                #print("Merge decision: \n\t", cluster1, "\n\t", cluster2);
                
                self.__queue.remove(cluster1);
                self.__queue.remove(cluster2);
                
                self.__delete_represented_points(cluster1);
                self.__delete_represented_points(cluster2);
        
                merged_cluster = self.__merge_clusters(cluster1, cluster2);
        
                self.__insert_represented_points(merged_cluster);
                
                # Pointers to clusters that should be relocated is stored here.
                cluster_relocation_requests = [];
                
                # Check for the last cluster
                if (len(self.__queue) > 0):
                    merged_cluster.closest = self.__queue[0];  # arbitrary cluster from queue
                    merged_cluster.distance = self.__cluster_distance(merged_cluster, merged_cluster.closest);
                    
                    for item in self.__queue:
                        distance = self.__cluster_distance(merged_cluster, item);
                        # Check if distance between new cluster and current is the best than now.
                        if (distance < merged_cluster.distance):
                            merged_cluster.closest = item;
                            merged_cluster.distance = distance;
                        
                        # Check if current cluster has removed neighbor.
                        if ( (item.closest is cluster1) or (item.closest is cluster2) ):
                            # If previous distance was less then distance to new cluster then nearest cluster should be found in the tree.
                            #print("Update: ", item);
                            if (item.distance < distance):
                                (item.closest, item.distance) = self.__closest_cluster(item, distance);
                                
                                # TODO: investigation of root cause is required.
                                # Itself and merged cluster should be always in list of neighbors in line with specified radius.
                                # But merged cluster may not be in list due to error calculation, therefore it should be added
                                # manually.
                                if (item.closest is None):
                                    item.closest = merged_cluster;
                                    item.distance = distance;
                                
                            # Otherwise new cluster is nearest.
                            else:
                                item.closest = merged_cluster;
                                item.distance = distance;
                            
                            cluster_relocation_requests.append(item);
                        elif (item.distance > distance):
                            item.closest = merged_cluster;
                            item.ditance = distance;
                            
                            cluster_relocation_requests.append(item);
                
                # New cluster and updated clusters should relocated in queue
                self.__insert_cluster(merged_cluster);
                [self.__relocate_cluster(item) for item in cluster_relocation_requests];

        
            # Change cluster representation
            self.__clusters = [ cure_cluster_unit.points for cure_cluster_unit in self.__queue ];
    
    
    def get_clusters(self):
        """!
        @brief Returns list of allocated clusters, each cluster contains indexes of objects in list of data.
        
        @return (list) List of allocated clusters.
        
        @see process()
        
        """
        
        return self.__clusters;  
    
    
    def __insert_cluster(self, cluster):
        """!
        @brief Insert cluster to the list (sorted queue) in line with sequence order (distance).
        
        @param[in] cluster (cure_cluster): Cluster that should be inserted.
        
        """
        
        for index in range(len(self.__queue)):
            if (cluster.distance < self.__queue[index].distance):
                self.__queue.insert(index, cluster);
                return;
    
        self.__queue.append(cluster);        


    def __relocate_cluster(self, cluster):
        """!
        @brief Relocate cluster in list in line with distance order.
        
        @param[in] cluster (cure_cluster): Cluster that should be relocated in line with order.
        
        """
        
        self.__queue.remove(cluster);
        self.__insert_cluster(cluster);


    def __closest_cluster(self, cluster, distance):
        """!
        @brief Find closest cluster to the specified cluster in line with distance.
        
        @param[in] cluster (cure_cluster): Cluster for which nearest cluster should be found.
        @param[in] distance (double): Closest distance to the previous cluster.
        
        @return (tuple) Pair (nearest CURE cluster, nearest distance) if the nearest cluster has been found, otherwise None is returned.
        
        """
        
        nearest_cluster = None;
        nearest_distance = float('inf');
        
        for point in cluster.rep:
            # Nearest nodes should be returned (at least it will return itself).
            nearest_nodes = self.__tree.find_nearest_dist_nodes(point, distance);
            for (candidate_distance, kdtree_node) in nearest_nodes:
                if ( (candidate_distance < nearest_distance) and (kdtree_node is not None) and (kdtree_node.payload is not cluster) ):
                    nearest_distance = candidate_distance;
                    nearest_cluster = kdtree_node.payload;
                    
        return (nearest_cluster, nearest_distance);


    def __insert_represented_points(self, cluster):
        """!
        @brief Insert representation points to the k-d tree.
        
        @param[in] cluster (cure_cluster): Cluster whose representation points should be inserted.
        
        """
        
        for point in cluster.rep:
            self.__tree.insert(point, cluster);


    def __delete_represented_points(self, cluster): 
        """!
        @brief Remove representation points of clusters from the k-d tree
        
        @param[in] cluster (cure_cluster): Cluster whose representation points should be removed.
        
        """
        
        for point in cluster.rep:
            self.__tree.remove(point);


    def __merge_clusters(self, cluster1, cluster2):
        """!
        @brief Merges two clusters and returns new merged cluster. Representation points and mean points are calculated for the new cluster.
        
        @param[in] cluster1 (cure_cluster): Cluster that should be merged.
        @param[in] cluster2 (cure_cluster): Cluster that should be merged.
        
        @return (cure_cluster) New merged CURE cluster.
        
        """
        
        merged_cluster = cure_cluster();
        
        merged_cluster.points = cluster1.points + cluster2.points;
        
        # merged_cluster.mean = ( len(cluster1.points) * cluster1.mean + len(cluster2.points) * cluster2.mean ) / ( len(cluster1.points) + len(cluster2.points) );
        dimension = len(cluster1.mean);
        merged_cluster.mean = [0] * dimension;
        if merged_cluster.points[1:] == merged_cluster.points[:-1]:
            merged_cluster.mean = merged_cluster.points[0]
        else:
            for index in range(dimension):
                merged_cluster.mean[index] = ( len(cluster1.points) * cluster1.mean[index] + len(cluster2.points) * cluster2.mean[index] ) / ( len(cluster1.points) + len(cluster2.points) );
        
        temporary = list(); # TODO: Set should be used in line with specification (article), but list is not hashable object therefore it's impossible to use list in this fucking set!
        
        for index in range(self.__number_represent_points):
            maximal_distance = 0;
            maximal_point = None;
            
            for point in merged_cluster.points:
                minimal_distance = 0;
                if (index == 0):
                    minimal_distance = euclidean_distance(point, merged_cluster.mean);
                    #minimal_distance = euclidean_distance_sqrt(point, merged_cluster.mean);
                else:
                    minimal_distance = euclidean_distance(point, temporary[0]);
                    #minimal_distance = cluster_distance(cure_cluster(point), cure_cluster(temporary[0]));
                    
                if (minimal_distance >= maximal_distance):
                    maximal_distance = minimal_distance;
                    maximal_point = point;
        
            if (maximal_point not in temporary):
                temporary.append(maximal_point);
                
        for point in temporary:
            representative_point = [0] * dimension;
            for index in range(dimension):
                representative_point[index] = point[index] + self.__compression * (merged_cluster.mean[index] - point[index]);
                
            merged_cluster.rep.append(representative_point);
        
        return merged_cluster;


    def __create_queue(self):
        """!
        @brief Create queue of sorted clusters by distance between them, where first cluster has the nearest neighbor. At the first iteration each cluster contains only one point.
        
        @param[in] data (list): Input data that is presented as list of points (objects), each point should be represented by list or tuple.
        
        @return (list) Create queue of sorted clusters by distance between them.
        
        """
        
        self.__queue = [cure_cluster(point) for point in self.__pointer_data];
        
        # set closest clusters
        for i in range(0, len(self.__queue)):
            minimal_distance = float('inf');
            closest_index_cluster = -1;
            
            for k in range(0, len(self.__queue)):
                if (i != k):
                    dist = self.__cluster_distance(self.__queue[i], self.__queue[k]);
                    if (dist < minimal_distance):
                        minimal_distance = dist;
                        closest_index_cluster = k;
            
            self.__queue[i].closest = self.__queue[closest_index_cluster];
            self.__queue[i].distance = minimal_distance;
        
        # sort clusters
        self.__queue.sort(key = lambda x: x.distance, reverse = False);
    

    def __create_kdtree(self):
        """!
        @brief Create k-d tree in line with created clusters. At the first iteration contains all points from the input data set.
        
        @return (kdtree) k-d tree that consist of representative points of CURE clusters.
        
        """
        
        self.__tree = kdtree();
        for current_cluster in self.__queue:
            for representative_point in current_cluster.rep:
                self.__tree.insert(representative_point, current_cluster);    


    def __cluster_distance(self, cluster1, cluster2):
        """!
        @brief Calculate minimal distance between clusters using representative points.
        
        @param[in] cluster1 (cure_cluster): The first cluster.
        @param[in] cluster2 (cure_cluster): The second cluster.
        
        @return (double) Euclidean distance between two clusters that is defined by minimum distance between representation points of two clusters.
        
        """
        
        distance = float('inf');
        for i in range(0, len(cluster1.rep)):
            for k in range(0, len(cluster2.rep)):
                #dist = euclidean_distance_sqrt(cluster1.rep[i], cluster2.rep[k]);   # Fast mode
                dist = euclidean_distance(cluster1.rep[i], cluster2.rep[k]);        # Slow mode
                if (dist < distance):
                    distance = dist;
                    
        return distance;


1			"""!
2
3			@brief Cluster analysis algorithm: CURE
4			@details Implementation based on article:
5			- S.Guha, R.Rastogi, K.Shim. CURE: An Efficient Clustering Algorithm for Large Databases. 1998.
6
7			@authors Andrei Novikov ([email protected])
8			@date 2014-2016
9			@copyright GNU Public License
10
11			@cond GNU_PUBLIC_LICENSE
12			PyClustering is free software: you can redistribute it and/or modify
13			it under the terms of the GNU General Public License as published by
14			the Free Software Foundation, either version 3 of the License, or
15			(at your option) any later version.
16
17			PyClustering is distributed in the hope that it will be useful,
18			but WITHOUT ANY WARRANTY; without even the implied warranty of
19			MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
20			GNU General Public License for more details.
21
22			You should have received a copy of the GNU General Public License
23			along with this program. If not, see <http://www.gnu.org/licenses/>.
24			@endcond
25
26			"""
27
28			from pyclustering.utils import euclidean_distance;
29			from pyclustering.utils import euclidean_distance_sqrt;
			0 ignored issues – show Unused Code introduced 2016-03-31 09:26 UTC by Report Bug Copy Issue Report Unused euclidean_distance_sqrt imported from pyclustering.utils Loading history...
30
31			from pyclustering.container.kdtree import kdtree;
32
33			import pyclustering.core.wrapper as wrapper;
34
35			class cure_cluster:
36			"""!
37			@brief Represents data cluster in CURE term. CURE cluster is described by points of cluster, representation points of the cluster and by the cluster center.
38
39			"""
40
41			## List of points that make up cluster.
42			points = None;
43
44			## Mean of points that make up cluster.
45			mean = None;
46
47			## List of points that represents clusters.
48			rep = None;
49
50			## Pointer to the closest cluster.
51			closest = None;
52
53			## Distance to the closest cluster.
54			distance = None;
55
56			def __init__(self, point = None):
57			"""!
58			@brief Constructor of CURE cluster.
59
60			@param[in] point (list): Point represented by list of coordinates.
61
62			"""
63			if (point is not None):
64			self.points = [ point ];
65			self.mean = point;
66			self.rep = [ point ];
67			else:
68			self.points = [ ];
69			self.mean = None;
70			self.rep = [ ];
71
72			self.closest = None;
73			self.distance = float('inf'); # calculation of distance is really complexity operation (even square distance), so let's store distance to closest cluster.
74
75			def __repr__(self):
76			"""!
77			@brief Displays distance to closest cluster and points that are contained by current cluster.
78
79			"""
80			return "%s, %s" % (self.distance, self.points);
81
82
83			class cure:
84			"""!
85			@brief Class represents clustering algorithm CURE.
86
87			Example:
88			@code
89			# read data for clustering from some file
90			sample = read_sample(path_to_data);
91
92			# create instance of cure algorithm for cluster analysis
93			# request for allocation of two clusters.
94			cure_instance = cure(sample, 2, 5, 0.5, True);
95
96			# run cluster analysis
97			cure_instance.process();
98
99			# get results of clustering
100			clusters = cure_instance.get_clusters();
101			@endcode
102
103			"""
104			__pointer_data = None;
105			__clusters = None;
106
107			__number_cluster = 0;
108			__number_represent_points = 0;
109			__compression = 0;
110
111			__queue = None;
112			__tree = None;
113
114			__ccore = False;
115
116			def __init__(self, data, number_cluster, number_represent_points = 5, compression = 0.5, ccore = False):
117			"""!
118			@brief Constructor of clustering algorithm CURE.
119
120			@param[in] data (list): Input data that is presented as list of points (objects), each point should be represented by list or tuple.
121			@param[in] number_cluster (uint): Number of clusters that should be allocated.
122			@param[in] number_represent_points (uint): Number of representative points for each cluster.
123			@param[in] compression (double): Coefficient defines level of shrinking of representation points toward the mean of the new created cluster after merging on each step. Usually it destributed from 0 to 1.
124			@param[in] ccore (bool): If True than DLL CCORE (C++ solution) will be used for solving.
125
126			"""
127
128			self.__pointer_data = data;
129			self.__number_cluster = number_cluster;
130			self.__number_represent_points = number_represent_points;
131			self.__compression = compression;
132
133			self.__ccore = ccore;
134
135			if (ccore is False):
136			self.__create_queue(); # queue
137			self.__create_kdtree(); # create k-d tree
138
139
140			def process(self):
141			"""!
142			@brief Performs cluster analysis in line with rules of CURE algorithm.
143
144			@remark Results of clustering can be obtained using corresponding get methods.
145
146			@see get_clusters()
147
148			"""
149
150			if (self.__ccore is True):
151			self.__clusters = wrapper.cure(self.__pointer_data, self.__number_cluster, self.__number_represent_points, self.__compression);
152			else:
153
154			while (len(self.__queue) > self.__number_cluster):
155			cluster1 = self.__queue[0]; # cluster that has nearest neighbor.
156			cluster2 = cluster1.closest; # closest cluster.
157
158			#print("Merge decision: \n\t", cluster1, "\n\t", cluster2);
159
160			self.__queue.remove(cluster1);
161			self.__queue.remove(cluster2);
162
163			self.__delete_represented_points(cluster1);
164			self.__delete_represented_points(cluster2);
165
166			merged_cluster = self.__merge_clusters(cluster1, cluster2);
167
168			self.__insert_represented_points(merged_cluster);
169
170			# Pointers to clusters that should be relocated is stored here.
171			cluster_relocation_requests = [];
172
173			# Check for the last cluster
174			if (len(self.__queue) > 0):
175			merged_cluster.closest = self.__queue[0]; # arbitrary cluster from queue
176			merged_cluster.distance = self.__cluster_distance(merged_cluster, merged_cluster.closest);
177
178			for item in self.__queue:
179			distance = self.__cluster_distance(merged_cluster, item);
180			# Check if distance between new cluster and current is the best than now.
181			if (distance < merged_cluster.distance):
182			merged_cluster.closest = item;
183			merged_cluster.distance = distance;
184
185			# Check if current cluster has removed neighbor.
186			if ( (item.closest is cluster1) or (item.closest is cluster2) ):
187			# If previous distance was less then distance to new cluster then nearest cluster should be found in the tree.
188			#print("Update: ", item);
189			if (item.distance < distance):
190			(item.closest, item.distance) = self.__closest_cluster(item, distance);
191
192			# TODO: investigation of root cause is required.
193			# Itself and merged cluster should be always in list of neighbors in line with specified radius.
194			# But merged cluster may not be in list due to error calculation, therefore it should be added
195			# manually.
196			if (item.closest is None):
197			item.closest = merged_cluster;
198			item.distance = distance;
199
200			# Otherwise new cluster is nearest.
201			else:
202			item.closest = merged_cluster;
203			item.distance = distance;
204
205			cluster_relocation_requests.append(item);
206			elif (item.distance > distance):
207			item.closest = merged_cluster;
208			item.ditance = distance;
209
210			cluster_relocation_requests.append(item);
211
212			# New cluster and updated clusters should relocated in queue
213			self.__insert_cluster(merged_cluster);
214			[self.__relocate_cluster(item) for item in cluster_relocation_requests];
			0 ignored issues – show Unused Code Bug introduced 2016-03-31 09:26 UTC by Report Bug Copy Issue Report The expression `[self.__relocate_cluster...er_relocation_requests]` does not seem to have sideeffects and its result is not used. If a expression has no sideeffects (any lasting effect after it has been called) and its return value is not used, this usually means that this code can be removed or that an assignment is missing. Loading history...
215
216			# Change cluster representation
217			self.__clusters = [ cure_cluster_unit.points for cure_cluster_unit in self.__queue ];
218
219
220			def get_clusters(self):
221			"""!
222			@brief Returns list of allocated clusters, each cluster contains indexes of objects in list of data.
223
224			@return (list) List of allocated clusters.
225
226			@see process()
227
228			"""
229
230			return self.__clusters;
231
232
233			def __insert_cluster(self, cluster):
234			"""!
235			@brief Insert cluster to the list (sorted queue) in line with sequence order (distance).
236
237			@param[in] cluster (cure_cluster): Cluster that should be inserted.
238
239			"""
240
241			for index in range(len(self.__queue)):
242			if (cluster.distance < self.__queue[index].distance):
243			self.__queue.insert(index, cluster);
244			return;
245
246			self.__queue.append(cluster);
247
248
249			def __relocate_cluster(self, cluster):
250			"""!
251			@brief Relocate cluster in list in line with distance order.
252
253			@param[in] cluster (cure_cluster): Cluster that should be relocated in line with order.
254
255			"""
256
257			self.__queue.remove(cluster);
258			self.__insert_cluster(cluster);
259
260
261			def __closest_cluster(self, cluster, distance):
262			"""!
263			@brief Find closest cluster to the specified cluster in line with distance.
264
265			@param[in] cluster (cure_cluster): Cluster for which nearest cluster should be found.
266			@param[in] distance (double): Closest distance to the previous cluster.
267
268			@return (tuple) Pair (nearest CURE cluster, nearest distance) if the nearest cluster has been found, otherwise None is returned.
269
270			"""
271
272			nearest_cluster = None;
273			nearest_distance = float('inf');
274
275			for point in cluster.rep:
276			# Nearest nodes should be returned (at least it will return itself).
277			nearest_nodes = self.__tree.find_nearest_dist_nodes(point, distance);
278			for (candidate_distance, kdtree_node) in nearest_nodes:
279			if ( (candidate_distance < nearest_distance) and (kdtree_node is not None) and (kdtree_node.payload is not cluster) ):
280			nearest_distance = candidate_distance;
281			nearest_cluster = kdtree_node.payload;
282
283			return (nearest_cluster, nearest_distance);
284
285
286			def __insert_represented_points(self, cluster):
287			"""!
288			@brief Insert representation points to the k-d tree.
289
290			@param[in] cluster (cure_cluster): Cluster whose representation points should be inserted.
291
292			"""
293
294			for point in cluster.rep:
295			self.__tree.insert(point, cluster);
296
297
298			def __delete_represented_points(self, cluster):
299			"""!
300			@brief Remove representation points of clusters from the k-d tree
301
302			@param[in] cluster (cure_cluster): Cluster whose representation points should be removed.
303
304			"""
305
306			for point in cluster.rep:
307			self.__tree.remove(point);
308
309
310			def __merge_clusters(self, cluster1, cluster2):
311			"""!
312			@brief Merges two clusters and returns new merged cluster. Representation points and mean points are calculated for the new cluster.
313
314			@param[in] cluster1 (cure_cluster): Cluster that should be merged.
315			@param[in] cluster2 (cure_cluster): Cluster that should be merged.
316
317			@return (cure_cluster) New merged CURE cluster.
318
319			"""
320
321			merged_cluster = cure_cluster();
322
323			merged_cluster.points = cluster1.points + cluster2.points;
324
325			# merged_cluster.mean = ( len(cluster1.points) * cluster1.mean + len(cluster2.points) * cluster2.mean ) / ( len(cluster1.points) + len(cluster2.points) );
326			dimension = len(cluster1.mean);
327			merged_cluster.mean = [0] * dimension;
328			if merged_cluster.points[1:] == merged_cluster.points[:-1]:
329			merged_cluster.mean = merged_cluster.points[0]
330			else:
331			for index in range(dimension):
332			merged_cluster.mean[index] = ( len(cluster1.points) * cluster1.mean[index] + len(cluster2.points) * cluster2.mean[index] ) / ( len(cluster1.points) + len(cluster2.points) );
333
334			temporary = list(); # TODO: Set should be used in line with specification (article), but list is not hashable object therefore it's impossible to use list in this fucking set!
335
336			for index in range(self.__number_represent_points):
337			maximal_distance = 0;
338			maximal_point = None;
339
340			for point in merged_cluster.points:
341			minimal_distance = 0;
342			if (index == 0):
343			minimal_distance = euclidean_distance(point, merged_cluster.mean);
344			#minimal_distance = euclidean_distance_sqrt(point, merged_cluster.mean);
345			else:
346			minimal_distance = euclidean_distance(point, temporary[0]);
347			#minimal_distance = cluster_distance(cure_cluster(point), cure_cluster(temporary[0]));
348
349			if (minimal_distance >= maximal_distance):
350			maximal_distance = minimal_distance;
351			maximal_point = point;
352
353			if (maximal_point not in temporary):
354			temporary.append(maximal_point);
355
356			for point in temporary:
357			representative_point = [0] * dimension;
358			for index in range(dimension):
359			representative_point[index] = point[index] + self.__compression * (merged_cluster.mean[index] - point[index]);
360
361			merged_cluster.rep.append(representative_point);
362
363			return merged_cluster;
364
365
366			def __create_queue(self):
367			"""!
368			@brief Create queue of sorted clusters by distance between them, where first cluster has the nearest neighbor. At the first iteration each cluster contains only one point.
369
370			@param[in] data (list): Input data that is presented as list of points (objects), each point should be represented by list or tuple.
371
372			@return (list) Create queue of sorted clusters by distance between them.
373
374			"""
375
376			self.__queue = [cure_cluster(point) for point in self.__pointer_data];
377
378			# set closest clusters
379			for i in range(0, len(self.__queue)):
380			minimal_distance = float('inf');
381			closest_index_cluster = -1;
382
383			for k in range(0, len(self.__queue)):
384			if (i != k):
385			dist = self.__cluster_distance(self.__queue[i], self.__queue[k]);
386			if (dist < minimal_distance):
387			minimal_distance = dist;
388			closest_index_cluster = k;
389
390			self.__queue[i].closest = self.__queue[closest_index_cluster];
391			self.__queue[i].distance = minimal_distance;
392
393			# sort clusters
394			self.__queue.sort(key = lambda x: x.distance, reverse = False);
395
396
397			def __create_kdtree(self):
398			"""!
399			@brief Create k-d tree in line with created clusters. At the first iteration contains all points from the input data set.
400
401			@return (kdtree) k-d tree that consist of representative points of CURE clusters.
402
403			"""
404
405			self.__tree = kdtree();
406			for current_cluster in self.__queue:
407			for representative_point in current_cluster.rep:
408			self.__tree.insert(representative_point, current_cluster);
409
410
411			def __cluster_distance(self, cluster1, cluster2):
412			"""!
413			@brief Calculate minimal distance between clusters using representative points.
414
415			@param[in] cluster1 (cure_cluster): The first cluster.
416			@param[in] cluster2 (cure_cluster): The second cluster.
417
418			@return (double) Euclidean distance between two clusters that is defined by minimum distance between representation points of two clusters.
419
420			"""
421
422			distance = float('inf');
423			for i in range(0, len(cluster1.rep)):
424			for k in range(0, len(cluster2.rep)):
425			#dist = euclidean_distance_sqrt(cluster1.rep[i], cluster2.rep[k]); # Fast mode
426			dist = euclidean_distance(cluster1.rep[i], cluster2.rep[k]); # Slow mode
427			if (dist < distance):
428			distance = dist;
429
430			return distance;
431

annoviko / pyclustering

Push — master ( a63a82...61f52c )

pyclustering.cluster.cure C

Complexity

Size/Duplication

12 Methods

How to fix Complexity

Complex Class

Duplication Side-by-Side

Filter issues like