kmeans_visualizer - Code Metrics - Inspection of "[.] Refactoring of 'cluster_data' and its children..." - annoviko/pyclustering - Measure and Improve Code Quality continuously with Scrutinizer

Completed

Push — 0.8.dev ( 2aeeeb...e36293 )

by Andrei

created 2018-02-28 13:50 UTC

kmeans_visualizer A

↳ Parent: Project

Complexity

Total Complexity

Size/Duplication

Total Lines	100
Duplicated Lines	0 %

Importance

Changes

Metric	Value
dl	0
loc	100
rs	10
c	0
b	0
f	0
wmc	19

6 Methods

Rating	Name	Size	Complexity
A	__draw_rays()	7	2
A	__draw_centers()	10	3
A	__draw_center()	10	4
B	show_clusters()	34	3
A	__get_argument()	6	2
B	__draw_cluster_rays()	12	5

"""!

@brief Cluster analysis algorithm: K-Means
@details Based on book description:
         - J.B.MacQueen. Some Methods for Classification and Analysis of Multivariate Observations. 1967.

@authors Andrei Novikov ([email protected])
@date 2014-2018
@copyright GNU Public License

@cond GNU_PUBLIC_LICENSE
    PyClustering is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.
    
    PyClustering is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.
    
    You should have received a copy of the GNU General Public License
    along with this program.  If not, see <http://www.gnu.org/licenses/>.
@endcond

"""


import numpy;
# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3

import matplotlib.pyplot as plt;
# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3

import pyclustering.core.kmeans_wrapper as wrapper;

from pyclustering.core.wrapper import ccore_library;

from pyclustering.cluster.encoder import type_encoding;
from pyclustering.cluster import cluster_visualizer;


class kmeans_observer:
    """!
    @brief Observer of K-Means algorithm that is used to collect information about clustering process on each iteration of the algorithm.
    
    @see kmeans
    
    """
    
    def __init__(self, evolution_clusters = [], evolution_centers = []):
# Bad:
# If array_param is modified inside the function, the next invocation will
# receive the modified object.
def some_function(array_param=[]):
    # ...

# Better: Create an array on each invocation
def some_function(array_param=None):
    array_param = array_param or []
    # ...
        """!
        @brief Initializer of observer of K-Means algorithm.
        
        @param[in] evolution_clusters (array_like): Evolution of clusters changes during clustering process.
        @param[in] evolution_centers (array_like): Evolution of centers changes during clustering process.
        
        """
        self.__evolution_clusters   = [];
        self.__evolution_centers    = [];


    def __len__(self):
        """!
        @brief Returns amount of steps that were observer during clustering process in K-Means algorithm.
        
        """
        return len(self.__evolution_clusters);


    def notify(self, clusters, centers):
        """!
        @brief This method is called by K-Means algorithm to notify about changes.
        
        @param[in] clusters (array_like): Allocated clusters by K-Means algorithm.
        @param[in] centers (array_like): Allocated centers by K-Means algorithm.
        
        """
        self.__evolution_clusters.append(clusters);
        self.__evolution_clusters.append(centers);


    def get_centers(self, iteration):
        """!
        @brief Get method to return centers at specific iteration of clustering process.
        
        @param[in] iteration (uint): Clustering process iteration at which centers are required.
        
        @return (array_like) Centers at specific iteration.
        
        """
        return self.__evolution_centers[iteration];


    def get_clusters(self, iteration):
        """!
        @brief Get method to return allocated clusters at specific iteration of clustering process.
        
        @param[in] iteration (uint): Clustering process iteration at which clusters are required.
        
        @return (array_like) Clusters at specific iteration.
        
        """
        return self.__evolution_clusters[iteration];



class kmeans_visualizer:
    """!
    @brief Visualizer of K-Means algorithm's results.
    @details K-Means visualizer provides visualization services that are specific for K-Means algorithm.
    
    """
    
    __default_2d_marker_size = 15;
    __default_3d_marker_size = 70;
    
    
    @staticmethod
    def show_clusters(sample, clusters, centers, initial_centers = None, **kwargs):
        """!
        @brief 
        
        @param[in] sample (list): Dataset that were used for clustering.
        @param[in] clusters (array_like): Clusters that were allocated by the algorithm.
        @param[in] centers (array_like): Centers that were allocated by the algorithm.
        @param[in] initial_centers (array_like): Initial centers that were used by the algorithm, if 'None' then initial centers are not displyed.
        @param[in] **kwargs: Arbitrary keyword arguments (available arguments: 'figure', 'display').
        
        Keyword Args:
            figure (figure): If 'None' then new is figure is creater, otherwise specified figure is used for visualization.
            display (bool): If 'True' then figure will be shown by the method, otherwise it should be shown manually using matplotlib function 'plt.show()'.
        
        @return (figure) Figure where clusters were drawn.
        
        """

        visualizer = cluster_visualizer();
        visualizer.append_clusters(clusters, sample);
        
        if (kmeans_visualizer.__get_argument('figure', None, **kwargs) is None):
            figure = visualizer.show(display = False);
        else:
            visualizer.show(figure = figure, display = False);
        
        kmeans_visualizer.__draw_centers(figure, visualizer, centers, initial_centers);
        kmeans_visualizer.__draw_rays(figure, visualizer, sample, clusters, centers);
        
        if (kmeans_visualizer.__get_argument('display', True, **kwargs) is True):
            plt.show();

        return figure;


    @staticmethod
    def __draw_rays(figure, visualizer, sample, clusters, centers):
        ax = figure.get_axes()[0];
        
        for index_cluster in range(len(clusters)):
            color = visualizer.get_cluster_color(index_cluster, 0);
            kmeans_visualizer.__draw_cluster_rays(ax, color, sample, clusters[index_cluster], centers[index_cluster])


    @staticmethod
    def __draw_cluster_rays(ax, color, sample, cluster, center):
        dimension = len(sample[0]);
        
        for index_point in cluster:
            point = sample[index_point];
            if (dimension == 1):
                ax.plot([point[0], center[0]], [0.0, 0.0], '-', color=color, linewidth=0.5);
            elif (dimension == 2):
                ax.plot([point[0], center[0]], [point[1], center[1]], '-', color=color, linewidth=0.5);
            elif (dimension == 3):
                ax.plot([point[0], center[0]], [point[1], center[1]], [point[2], center[2]], '-', color=color, linewidth=0.5)


    @staticmethod
    def __draw_center(ax, center, color, marker, alpha):
        dimension = len(center);
        
        if (dimension == 1):
            ax.plot(center[0], 0.0, color=color, alpha=alpha, marker=marker, markersize=kmeans_visualizer.__default_2d_marker_size);
        elif (dimension == 2):
            ax.plot(center[0], center[1], color=color, alpha=alpha, marker=marker, markersize=kmeans_visualizer.__default_2d_marker_size);
        elif (dimension == 3):
            ax.scatter(center[0], center[1], center[2], c=color, alpha=alpha, marker=marker, s=kmeans_visualizer.__default_3d_marker_size);


    @staticmethod
    def __draw_centers(figure, visualizer, centers, initial_centers):
        ax = figure.get_axes()[0];
        
        for index_center in range(len(centers)):
            color = visualizer.get_cluster_color(index_center, 0);
            kmeans_visualizer.__draw_center(ax, centers[index_center], color, '*', 1.0);
            
            if initial_centers:
                kmeans_visualizer.__draw_center(ax, initial_centers[index_center], color, '*', 0.4);


    @staticmethod
    def __get_argument(argument_name, default_value, **kwargs):
        if (argument_name in kwargs):
            return kwargs[argument_name];
        
        return default_value;



class kmeans:
    """!
    @brief Class represents clustering algorithm K-Means.
    @details CCORE option can be used to use the pyclustering core - C/C++ shared library for processing that significantly increases performance.
    
             CCORE implementation of the algorithm uses thread pool to parallelize the clustering process.
             
             K-Means clustering results depend on initial centers. Algorithm K-Means++ can used for initialization 
             initial centers from module 'pyclustering.cluster.center_initializer'.
    
    Example #1 - Trivial clustering:
    @code
        # load list of points for cluster analysis
        sample = read_sample(path);
        
        # create instance of K-Means algorithm
        kmeans_instance = kmeans(sample, [ [0.0, 0.1], [2.5, 2.6] ]);
        
        # run cluster analysis and obtain results
        kmeans_instance.process();
        clusters = kmeans_instance.get_clusters();
    @endcode
    
    Example #2 - Clustering using K-Means++ for center initialization:
    @code
        # load list of points for cluster analysis
        sample = read_sample(SIMPLE_SAMPLES.SAMPLE_SIMPLE2);
        
        # initialize initial centers using K-Means++ method
        initial_centers = kmeans_plusplus_initializer(sample, 3).initialize();
        
        # create instance of K-Means algorithm with prepared centers
        kmeans_instance = kmeans(sample, initial_centers);
        
        # run cluster analysis and obtain results
        kmeans_instance.process();
        clusters = kmeans_instance.get_clusters();
        final_centers = kmeans_instance.get_centers();
    @endcode
    
    @see center_initializer
    
    """
    
    def __init__(self, data, initial_centers, tolerance = 0.001, ccore = True, **kwargs):
        """!
        @brief Constructor of clustering algorithm K-Means.
        @details Center initializer can be used for creating initial centers, for example, K-Means++ method.
        
        @param[in] data (array_like): Input data that is presented as array of points (objects), each point should be represented by array_like data structure.
        @param[in] initial_centers (array_like): Initial coordinates of centers of clusters that are represented by array_like data structure: [center1, center2, ...].
        @param[in] tolerance (double): Stop condition: if maximum value of change of centers of clusters is less than tolerance then algorithm stops processing.
        @param[in] ccore (bool): Defines should be CCORE library (C++ pyclustering library) used instead of Python code or not.
        @param[in] **kwargs: Arbitrary keyword arguments (available arguments: 'observer').
        
        Keyword Args:
            observer (kmeans_observer): Observer of the algorithm to collect information about clustering process on each iteration.
        
        @see center_initializer
        
        """
        self.__pointer_data = numpy.matrix(data);
        self.__clusters = [];
        self.__centers = numpy.matrix(initial_centers);
        self.__tolerance = tolerance;
        
        self.__observer = None;
        if 'observer' in kwargs:
            self.__observer = kwargs['observer'];
        
        self.__ccore = ccore;
        if (self.__ccore):
            self.__ccore = ccore_library.workable();


    def process(self):
        """!
        @brief Performs cluster analysis in line with rules of K-Means algorithm.
        
        @remark Results of clustering can be obtained using corresponding get methods.
        
        @see get_clusters()
        @see get_centers()
        
        """
        
        if (self.__ccore is True):
            results = wrapper.kmeans(self.__pointer_data, self.__centers, self.__tolerance, (self.__observer is not None));
            self.__clusters = results[0];
            self.__centers  = results[1];
            
            if self.__observer:
                self.__observer = kmeans_observer(results[2], results[3]);
        else:
            maximum_change = float('inf');
             
            stop_condition = self.__tolerance * self.__tolerance;   # Fast solution
            #stop_condition = self.__tolerance;              # Slow solution
             
            # Check for dimension
            if (len(self.__pointer_data[0]) != len(self.__centers[0])):
                raise NameError('Dimension of the input data and dimension of the initial cluster centers must be equal.');
             
            while (maximum_change > stop_condition):
                self.__clusters = self.__update_clusters();
                updated_centers = self.__update_centers();  # changes should be calculated before asignment
                
                if self.__observer:
                    self.__observer.notify(self.__clusters, updated_centers);
                
                if (len(self.__centers) != len(updated_centers)):
                    maximum_change = float('inf');
                
                else:
                    changes = numpy.sum(numpy.square(self.__centers - updated_centers), axis=1);
                    maximum_change = numpy.max(changes);
                
                self.__centers = updated_centers;


    def get_clusters(self):
        """!
        @brief Returns list of allocated clusters, each cluster contains indexes of objects in list of data.
        
        @see process()
        @see get_centers()
        
        """
        
        return self.__clusters;
    
    
    def get_centers(self):
        """!
        @brief Returns list of centers of allocated clusters.
        
        @see process()
        @see get_clusters()
        
        """

        return self.__centers.tolist();


    def get_cluster_encoding(self):
        """!
        @brief Returns clustering result representation type that indicate how clusters are encoded.
        
        @return (type_encoding) Clustering result representation.
        
        @see get_clusters()
        
        """
        
        return type_encoding.CLUSTER_INDEX_LIST_SEPARATION;


    def __update_clusters(self):
        """!
        @brief Calculate Euclidean distance to each point from the each cluster. Nearest points are captured by according clusters and as a result clusters are updated.
        
        @return (list) updated clusters as list of clusters. Each cluster contains indexes of objects from data.
        
        """
        
        clusters = [[] for _ in range(len(self.__centers))];
        
        dataset_differences = numpy.zeros((len(clusters), len(self.__pointer_data)));
        for index_center in range(len(self.__centers)):
            dataset_differences[index_center] = numpy.sum(numpy.square(self.__pointer_data - self.__centers[index_center]), axis=1).T;
        
        optimum_indexes = numpy.argmin(dataset_differences, axis=0);
        for index_point in range(len(optimum_indexes)):
            index_cluster = optimum_indexes[index_point];
            clusters[index_cluster].append(index_point);
        
        clusters = [cluster for cluster in clusters if len(cluster) > 0];
        
        return clusters;
    
    
    def __update_centers(self):
        """!
        @brief Calculate centers of clusters in line with contained objects.
        
        @return (numpy.matrix) Updated centers as list of centers.
        
        """
        
        dimension = self.__pointer_data.shape[1];
        centers = numpy.zeros((len(self.__clusters), dimension));
        
        for index in range(len(self.__clusters)):
            cluster_points = self.__pointer_data[self.__clusters[index], :];
            centers[index] = cluster_points.mean(axis=0);

        return numpy.matrix(centers);


Push — 0.8.dev ( 2aeeeb...e36293 )

kmeans_visualizer A

Complexity

Size/Duplication

Importance

6 Methods

1. Missing Dependencies

2. Missing init.py files

1. Missing Dependencies

2. Missing init.py files

1			"""!
2
3			@brief Cluster analysis algorithm: K-Means
4			@details Based on book description:
5			- J.B.MacQueen. Some Methods for Classification and Analysis of Multivariate Observations. 1967.
6
7			@authors Andrei Novikov ([email protected])
8			@date 2014-2018
9			@copyright GNU Public License
10
11			@cond GNU_PUBLIC_LICENSE
12			PyClustering is free software: you can redistribute it and/or modify
13			it under the terms of the GNU General Public License as published by
14			the Free Software Foundation, either version 3 of the License, or
15			(at your option) any later version.
16
17			PyClustering is distributed in the hope that it will be useful,
18			but WITHOUT ANY WARRANTY; without even the implied warranty of
19			MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
20			GNU General Public License for more details.
21
22			You should have received a copy of the GNU General Public License
23			along with this program. If not, see <http://www.gnu.org/licenses/>.
24			@endcond
25
26			"""
27
28
29			import numpy;
			0 ignored issues – show Configuration introduced 2018-02-27 17:16 UTC by Report Bug Copy Issue Report The import `numpy` could not be resolved. This can be caused by one of the following: 1. Missing Dependencies This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands. # .scrutinizer.yml before_commands: - sudo pip install abc # Python2 - sudo pip3 install abc # Python3 Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version. 2. Missing __init__.py files This error could also result from missing `__init__.py` files in your module folders. Make sure that you place one file in each sub-folder. Loading history...
30
31			import matplotlib.pyplot as plt;
			0 ignored issues – show Configuration introduced 2018-02-27 17:16 UTC by Report Bug Copy Issue Report The import `matplotlib.pyplot` could not be resolved. This can be caused by one of the following: 1. Missing Dependencies This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands. # .scrutinizer.yml before_commands: - sudo pip install abc # Python2 - sudo pip3 install abc # Python3 Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version. 2. Missing __init__.py files This error could also result from missing `__init__.py` files in your module folders. Make sure that you place one file in each sub-folder. Loading history...
32
33			import pyclustering.core.kmeans_wrapper as wrapper;
34
35			from pyclustering.core.wrapper import ccore_library;
36
37			from pyclustering.cluster.encoder import type_encoding;
38			from pyclustering.cluster import cluster_visualizer;
39
40
41			class kmeans_observer:
42			"""!
43			@brief Observer of K-Means algorithm that is used to collect information about clustering process on each iteration of the algorithm.
44
45			@see kmeans
46
47			"""
48
49			def __init__(self, evolution_clusters = [], evolution_centers = []):
			0 ignored issues – show Bug Best Practice introduced 2018-02-28 13:51 UTC by Report Bug Copy Issue Report The default value `[]` might cause unintended side-effects. Objects as default values are only created once in Python and not on each invocation of the function. If the default object is modified, this modification is carried over to the next invocation of the method. # Bad: # If array_param is modified inside the function, the next invocation will # receive the modified object. def some_function(array_param=[]): # ... # Better: Create an array on each invocation def some_function(array_param=None): array_param = array_param or [] # ... Loading history... Unused Code introduced 2018-02-28 13:51 UTC by Report Bug Copy Issue Report The argument `evolution_centers` seems to be unused. Loading history... Unused Code introduced 2018-02-28 13:51 UTC by Report Bug Copy Issue Report The argument `evolution_clusters` seems to be unused. Loading history...
50			"""!
51			@brief Initializer of observer of K-Means algorithm.
52
53			@param[in] evolution_clusters (array_like): Evolution of clusters changes during clustering process.
54			@param[in] evolution_centers (array_like): Evolution of centers changes during clustering process.
55
56			"""
57			self.__evolution_clusters = [];
58			self.__evolution_centers = [];
59
60
61			def __len__(self):
62			"""!
63			@brief Returns amount of steps that were observer during clustering process in K-Means algorithm.
64
65			"""
66			return len(self.__evolution_clusters);
67
68
69			def notify(self, clusters, centers):
70			"""!
71			@brief This method is called by K-Means algorithm to notify about changes.
72
73			@param[in] clusters (array_like): Allocated clusters by K-Means algorithm.
74			@param[in] centers (array_like): Allocated centers by K-Means algorithm.
75
76			"""
77			self.__evolution_clusters.append(clusters);
78			self.__evolution_clusters.append(centers);
79
80
81			def get_centers(self, iteration):
82			"""!
83			@brief Get method to return centers at specific iteration of clustering process.
84
85			@param[in] iteration (uint): Clustering process iteration at which centers are required.
86
87			@return (array_like) Centers at specific iteration.
88
89			"""
90			return self.__evolution_centers[iteration];
91
92
93			def get_clusters(self, iteration):
94			"""!
95			@brief Get method to return allocated clusters at specific iteration of clustering process.
96
97			@param[in] iteration (uint): Clustering process iteration at which clusters are required.
98
99			@return (array_like) Clusters at specific iteration.
100
101			"""
102			return self.__evolution_clusters[iteration];
103
104
105
106			class kmeans_visualizer:
107			"""!
108			@brief Visualizer of K-Means algorithm's results.
109			@details K-Means visualizer provides visualization services that are specific for K-Means algorithm.
110
111			"""
112
113			__default_2d_marker_size = 15;
114			__default_3d_marker_size = 70;
115
116
117			@staticmethod
118			def show_clusters(sample, clusters, centers, initial_centers = None, **kwargs):
119			"""!
120			@brief
121
122			@param[in] sample (list): Dataset that were used for clustering.
123			@param[in] clusters (array_like): Clusters that were allocated by the algorithm.
124			@param[in] centers (array_like): Centers that were allocated by the algorithm.
125			@param[in] initial_centers (array_like): Initial centers that were used by the algorithm, if 'None' then initial centers are not displyed.
126			@param[in] **kwargs: Arbitrary keyword arguments (available arguments: 'figure', 'display').
127
128			Keyword Args:
129			figure (figure): If 'None' then new is figure is creater, otherwise specified figure is used for visualization.
130			display (bool): If 'True' then figure will be shown by the method, otherwise it should be shown manually using matplotlib function 'plt.show()'.
131
132			@return (figure) Figure where clusters were drawn.
133
134			"""
135
136			visualizer = cluster_visualizer();
137			visualizer.append_clusters(clusters, sample);
138
139			if (kmeans_visualizer.__get_argument('figure', None, **kwargs) is None):
140			figure = visualizer.show(display = False);
141			else:
142			visualizer.show(figure = figure, display = False);
143
144			kmeans_visualizer.__draw_centers(figure, visualizer, centers, initial_centers);
145			kmeans_visualizer.__draw_rays(figure, visualizer, sample, clusters, centers);
146
147			if (kmeans_visualizer.__get_argument('display', True, **kwargs) is True):
148			plt.show();
149
150			return figure;
151
152
153			@staticmethod
154			def __draw_rays(figure, visualizer, sample, clusters, centers):
155			ax = figure.get_axes()[0];
156
157			for index_cluster in range(len(clusters)):
158			color = visualizer.get_cluster_color(index_cluster, 0);
159			kmeans_visualizer.__draw_cluster_rays(ax, color, sample, clusters[index_cluster], centers[index_cluster])
160
161
162			@staticmethod
163			def __draw_cluster_rays(ax, color, sample, cluster, center):
164			dimension = len(sample[0]);
165
166			for index_point in cluster:
167			point = sample[index_point];
168			if (dimension == 1):
169			ax.plot([point[0], center[0]], [0.0, 0.0], '-', color=color, linewidth=0.5);
170			elif (dimension == 2):
171			ax.plot([point[0], center[0]], [point[1], center[1]], '-', color=color, linewidth=0.5);
172			elif (dimension == 3):
173			ax.plot([point[0], center[0]], [point[1], center[1]], [point[2], center[2]], '-', color=color, linewidth=0.5)
174
175
176			@staticmethod
177			def __draw_center(ax, center, color, marker, alpha):
178			dimension = len(center);
179
180			if (dimension == 1):
181			ax.plot(center[0], 0.0, color=color, alpha=alpha, marker=marker, markersize=kmeans_visualizer.__default_2d_marker_size);
182			elif (dimension == 2):
183			ax.plot(center[0], center[1], color=color, alpha=alpha, marker=marker, markersize=kmeans_visualizer.__default_2d_marker_size);
184			elif (dimension == 3):
185			ax.scatter(center[0], center[1], center[2], c=color, alpha=alpha, marker=marker, s=kmeans_visualizer.__default_3d_marker_size);
186
187
188			@staticmethod
189			def __draw_centers(figure, visualizer, centers, initial_centers):
190			ax = figure.get_axes()[0];
191
192			for index_center in range(len(centers)):
193			color = visualizer.get_cluster_color(index_center, 0);
194			kmeans_visualizer.__draw_center(ax, centers[index_center], color, '*', 1.0);
195
196			if initial_centers:
197			kmeans_visualizer.__draw_center(ax, initial_centers[index_center], color, '*', 0.4);
198
199
200			@staticmethod
201			def __get_argument(argument_name, default_value, **kwargs):
202			if (argument_name in kwargs):
203			return kwargs[argument_name];
204
205			return default_value;
206
207
208
209			class kmeans:
210			"""!
211			@brief Class represents clustering algorithm K-Means.
212			@details CCORE option can be used to use the pyclustering core - C/C++ shared library for processing that significantly increases performance.
213
214			CCORE implementation of the algorithm uses thread pool to parallelize the clustering process.
215
216			K-Means clustering results depend on initial centers. Algorithm K-Means++ can used for initialization
217			initial centers from module 'pyclustering.cluster.center_initializer'.
218
219			Example #1 - Trivial clustering:
220			@code
221			# load list of points for cluster analysis
222			sample = read_sample(path);
223
224			# create instance of K-Means algorithm
225			kmeans_instance = kmeans(sample, [ [0.0, 0.1], [2.5, 2.6] ]);
226
227			# run cluster analysis and obtain results
228			kmeans_instance.process();
229			clusters = kmeans_instance.get_clusters();
230			@endcode
231
232			Example #2 - Clustering using K-Means++ for center initialization:
233			@code
234			# load list of points for cluster analysis
235			sample = read_sample(SIMPLE_SAMPLES.SAMPLE_SIMPLE2);
236
237			# initialize initial centers using K-Means++ method
238			initial_centers = kmeans_plusplus_initializer(sample, 3).initialize();
239
240			# create instance of K-Means algorithm with prepared centers
241			kmeans_instance = kmeans(sample, initial_centers);
242
243			# run cluster analysis and obtain results
244			kmeans_instance.process();
245			clusters = kmeans_instance.get_clusters();
246			final_centers = kmeans_instance.get_centers();
247			@endcode
248
249			@see center_initializer
250
251			"""
252
253			def __init__(self, data, initial_centers, tolerance = 0.001, ccore = True, **kwargs):
254			"""!
255			@brief Constructor of clustering algorithm K-Means.
256			@details Center initializer can be used for creating initial centers, for example, K-Means++ method.
257
258			@param[in] data (array_like): Input data that is presented as array of points (objects), each point should be represented by array_like data structure.
259			@param[in] initial_centers (array_like): Initial coordinates of centers of clusters that are represented by array_like data structure: [center1, center2, ...].
260			@param[in] tolerance (double): Stop condition: if maximum value of change of centers of clusters is less than tolerance then algorithm stops processing.
261			@param[in] ccore (bool): Defines should be CCORE library (C++ pyclustering library) used instead of Python code or not.
262			@param[in] **kwargs: Arbitrary keyword arguments (available arguments: 'observer').
263
264			Keyword Args:
265			observer (kmeans_observer): Observer of the algorithm to collect information about clustering process on each iteration.
266
267			@see center_initializer
268
269			"""
270			self.__pointer_data = numpy.matrix(data);
271			self.__clusters = [];
272			self.__centers = numpy.matrix(initial_centers);
273			self.__tolerance = tolerance;
274
275			self.__observer = None;
276			if 'observer' in kwargs:
277			self.__observer = kwargs['observer'];
278
279			self.__ccore = ccore;
280			if (self.__ccore):
281			self.__ccore = ccore_library.workable();
282
283
284			def process(self):
285			"""!
286			@brief Performs cluster analysis in line with rules of K-Means algorithm.
287
288			@remark Results of clustering can be obtained using corresponding get methods.
289
290			@see get_clusters()
291			@see get_centers()
292
293			"""
294
295			if (self.__ccore is True):
296			results = wrapper.kmeans(self.__pointer_data, self.__centers, self.__tolerance, (self.__observer is not None));
297			self.__clusters = results[0];
298			self.__centers = results[1];
299
300			if self.__observer:
301			self.__observer = kmeans_observer(results[2], results[3]);
302			else:
303			maximum_change = float('inf');
304
305			stop_condition = self.__tolerance * self.__tolerance; # Fast solution
306			#stop_condition = self.__tolerance; # Slow solution
307
308			# Check for dimension
309			if (len(self.__pointer_data[0]) != len(self.__centers[0])):
310			raise NameError('Dimension of the input data and dimension of the initial cluster centers must be equal.');
311
312			while (maximum_change > stop_condition):
313			self.__clusters = self.__update_clusters();
314			updated_centers = self.__update_centers(); # changes should be calculated before asignment
315
316			if self.__observer:
317			self.__observer.notify(self.__clusters, updated_centers);
318
319			if (len(self.__centers) != len(updated_centers)):
320			maximum_change = float('inf');
321
322			else:
323			changes = numpy.sum(numpy.square(self.__centers - updated_centers), axis=1);
324			maximum_change = numpy.max(changes);
325
326			self.__centers = updated_centers;
327
328
329			def get_clusters(self):
330			"""!
331			@brief Returns list of allocated clusters, each cluster contains indexes of objects in list of data.
332
333			@see process()
334			@see get_centers()
335
336			"""
337
338			return self.__clusters;
339
340
341			def get_centers(self):
342			"""!
343			@brief Returns list of centers of allocated clusters.
344
345			@see process()
346			@see get_clusters()
347
348			"""
349
350			return self.__centers.tolist();
351
352
353			def get_cluster_encoding(self):
354			"""!
355			@brief Returns clustering result representation type that indicate how clusters are encoded.
356
357			@return (type_encoding) Clustering result representation.
358
359			@see get_clusters()
360
361			"""
362
363			return type_encoding.CLUSTER_INDEX_LIST_SEPARATION;
364
365
366			def __update_clusters(self):
367			"""!
368			@brief Calculate Euclidean distance to each point from the each cluster. Nearest points are captured by according clusters and as a result clusters are updated.
369
370			@return (list) updated clusters as list of clusters. Each cluster contains indexes of objects from data.
371
372			"""
373
374			clusters = [[] for _ in range(len(self.__centers))];
375
376			dataset_differences = numpy.zeros((len(clusters), len(self.__pointer_data)));
377			for index_center in range(len(self.__centers)):
378			dataset_differences[index_center] = numpy.sum(numpy.square(self.__pointer_data - self.__centers[index_center]), axis=1).T;
379
380			optimum_indexes = numpy.argmin(dataset_differences, axis=0);
381			for index_point in range(len(optimum_indexes)):
382			index_cluster = optimum_indexes[index_point];
383			clusters[index_cluster].append(index_point);
384
385			clusters = [cluster for cluster in clusters if len(cluster) > 0];
386
387			return clusters;
388
389
390			def __update_centers(self):
391			"""!
392			@brief Calculate centers of clusters in line with contained objects.
393
394			@return (numpy.matrix) Updated centers as list of centers.
395
396			"""
397
398			dimension = self.__pointer_data.shape[1];
399			centers = numpy.zeros((len(self.__clusters), dimension));
400
401			for index in range(len(self.__clusters)):
402			cluster_points = self.__pointer_data[self.__clusters[index], :];
403			centers[index] = cluster_points.mean(axis=0);
404
405			return numpy.matrix(centers);
406

annoviko / pyclustering

Push — 0.8.dev ( 2aeeeb...e36293 )

kmeans_visualizer A

Complexity

Size/Duplication

Importance

6 Methods

1. Missing Dependencies

2. Missing __init__.py files

1. Missing Dependencies

2. Missing __init__.py files

Duplication Side-by-Side

Filter issues like

2. Missing init.py files

2. Missing init.py files