kmeans_visualizer - Code Metrics - Inspection of "#398: BSAS, MBSAS, TTSAS final changes." - annoviko/pyclustering - Measure and Improve Code Quality continuously with Scrutinizer

Completed

Push — master ( 0ab444...af3192 )

by Andrei

created 2018-04-24 12:08 UTC

kmeans_visualizer A

↳ Parent: Project

Complexity

Total Complexity

Size/Duplication

Total Lines	137
Duplicated Lines	23.36 %

Importance

Changes

Metric	Value
dl	32
loc	137
rs	10
c	0
b	0
f	0
wmc	21

8 Methods

Rating	Name	Duplication	Size	Complexity
A	__draw_rays()	0	7	2
A	__draw_centers()	0	10	3
A	__draw_center()	0	10	4
B	show_clusters()	0	39	3
A	init_frame()	2	2	1
A	frame_generation()	12	12	1
B	animate_cluster_allocation()	32	38	4
B	__draw_cluster_rays()	0	12	5

How to fix Duplicated Code

"""!

@brief Cluster analysis algorithm: K-Means
@details Based on book description:
         - J.B.MacQueen. Some Methods for Classification and Analysis of Multivariate Observations. 1967.

@authors Andrei Novikov ([email protected])
@date 2014-2018
@copyright GNU Public License

@cond GNU_PUBLIC_LICENSE
    PyClustering is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.
    
    PyClustering is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.
    
    You should have received a copy of the GNU General Public License
    along with this program.  If not, see <http://www.gnu.org/licenses/>.
@endcond

"""


import numpy;
# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3

import matplotlib.pyplot as plt;
# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
import matplotlib.animation as animation;
# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3

import pyclustering.core.kmeans_wrapper as wrapper;

from pyclustering.core.wrapper import ccore_library;

from pyclustering.cluster.encoder import type_encoding;
from pyclustering.cluster import cluster_visualizer;


class kmeans_observer:
    """!
    @brief Observer of K-Means algorithm that is used to collect information about clustering process on each iteration of the algorithm.
    
    @see kmeans
    
    """
    
    def __init__(self):
        """!
        @brief Initializer of observer of K-Means algorithm.
        
        """
        self.__evolution_clusters   = [];
        self.__evolution_centers    = [];
        self.__initial_centers      = [];


    def __len__(self):
        """!
        @brief Returns amount of steps that were observer during clustering process in K-Means algorithm.
        
        """
        return len(self.__evolution_clusters);


    def notify(self, clusters, centers):
        """!
        @brief This method is called by K-Means algorithm to notify about changes.
        
        @param[in] clusters (array_like): Allocated clusters by K-Means algorithm.
        @param[in] centers (array_like): Allocated centers by K-Means algorithm.
        
        """
        self.__evolution_clusters.append(clusters);
        self.__evolution_centers.append(centers);


    def set_evolution_centers(self, evolution_centers):
        """!
        @brief Set evolution of changes of centers during clustering process.
        
        @param[in] evolution_centers (array_like): Evolution of changes of centers during clustering process.
        
        """
        self.__evolution_centers = evolution_centers;


    def get_centers(self, iteration):
        """!
        @brief Get method to return centers at specific iteration of clustering process.
        
        @param[in] iteration (uint): Clustering process iteration at which centers are required.
        
        @return (array_like) Centers at specific iteration.
        
        """
        return self.__evolution_centers[iteration];


    def set_evolution_clusters(self, evolution_clusters):
        """!
        @brief Set evolution of changes of centers during clustering process.
        
        @param[in] evolution_clusters (array_like): Evolution of changes of clusters during clustering process.
        
        """
        self.__evolution_clusters = evolution_clusters;


    def get_clusters(self, iteration):
        """!
        @brief Get method to return allocated clusters at specific iteration of clustering process.
        
        @param[in] iteration (uint): Clustering process iteration at which clusters are required.
        
        @return (array_like) Clusters at specific iteration.
        
        """
        return self.__evolution_clusters[iteration];



class kmeans_visualizer:
    """!
    @brief Visualizer of K-Means algorithm's results.
    @details K-Means visualizer provides visualization services that are specific for K-Means algorithm.
    
    """
    
    __default_2d_marker_size = 15;
    __default_3d_marker_size = 70;
    
    
    @staticmethod
    def show_clusters(sample, clusters, centers, initial_centers = None, **kwargs):
        """!
        @brief Display K-Means clustering results.
        
        @param[in] sample (list): Dataset that was used for clustering.
        @param[in] clusters (array_like): Clusters that were allocated by the algorithm.
        @param[in] centers (array_like): Centers that were allocated by the algorithm.
        @param[in] initial_centers (array_like): Initial centers that were used by the algorithm, if 'None' then initial centers are not displyed.
        @param[in] **kwargs: Arbitrary keyword arguments (available arguments: 'figure', 'display').
        
        Keyword Args:
            figure (figure): If 'None' then new is figure is created, otherwise specified figure is used for visualization.
            display (bool): If 'True' then figure will be shown by the method, otherwise it should be shown manually using matplotlib function 'plt.show()'.
            offset (uint): Specify axes index on the figure where results should be drawn (only if argument 'figure' is specified).
        
        @return (figure) Figure where clusters were drawn.
        
        """

        visualizer = cluster_visualizer();
        visualizer.append_clusters(clusters, sample);
        
        offset = kwargs.get('offset', 0);
        figure = kwargs.get('figure', None);
        display = kwargs.get('display', True);

        if (figure is None):
            figure = visualizer.show(display = False);
        else:
            visualizer.show(figure = figure, display = False);
        
        kmeans_visualizer.__draw_centers(figure, offset, visualizer, centers, initial_centers);
        kmeans_visualizer.__draw_rays(figure, offset, visualizer, sample, clusters, centers);
        
        if (display is True):
            plt.show();

        return figure;


    @staticmethod
    def __draw_rays(figure, offset, visualizer, sample, clusters, centers):
        ax = figure.get_axes()[offset];
        
        for index_cluster in range(len(clusters)):
            color = visualizer.get_cluster_color(index_cluster, 0);
            kmeans_visualizer.__draw_cluster_rays(ax, color, sample, clusters[index_cluster], centers[index_cluster])


    @staticmethod
    def __draw_cluster_rays(ax, color, sample, cluster, center):
        dimension = len(sample[0]);
        
        for index_point in cluster:
            point = sample[index_point];
            if dimension == 1:
                ax.plot([point[0], center[0]], [0.0, 0.0], '-', color=color, linewidth=0.5);
            elif dimension == 2:
                ax.plot([point[0], center[0]], [point[1], center[1]], '-', color=color, linewidth=0.5);
            elif dimension == 3:
                ax.plot([point[0], center[0]], [point[1], center[1]], [point[2], center[2]], '-', color=color, linewidth=0.5)


    @staticmethod
    def __draw_center(ax, center, color, marker, alpha):
        dimension = len(center);
        
        if dimension == 1:
            ax.plot(center[0], 0.0, color=color, alpha=alpha, marker=marker, markersize=kmeans_visualizer.__default_2d_marker_size);
        elif dimension == 2:
            ax.plot(center[0], center[1], color=color, alpha=alpha, marker=marker, markersize=kmeans_visualizer.__default_2d_marker_size);
        elif dimension == 3:
            ax.scatter(center[0], center[1], center[2], c=color, alpha=alpha, marker=marker, s=kmeans_visualizer.__default_3d_marker_size);


    @staticmethod
    def __draw_centers(figure, offset, visualizer, centers, initial_centers):
        ax = figure.get_axes()[offset];
        
        for index_center in range(len(centers)):
            color = visualizer.get_cluster_color(index_center, 0);
            kmeans_visualizer.__draw_center(ax, centers[index_center], color, '*', 1.0);
            
            if initial_centers is not None:
                kmeans_visualizer.__draw_center(ax, initial_centers[index_center], color, '*', 0.4);


    @staticmethod
    def animate_cluster_allocation(data, observer, animation_velocity = 500, movie_fps = 1, save_movie = None):
        """!
        @brief Animates clustering process that is performed by K-Means algorithm.

        @param[in] data (list): Dataset that is used for clustering.
        @param[in] observer (kmeans_observer): EM observer that was used for collection information about clustering process.

        @param[in] animation_velocity (uint): Interval between frames in milliseconds (for run-time animation only).
        @param[in] movie_fps (uint): Defines frames per second (for rendering movie only).
        @param[in] save_movie (string): If it is specified then animation will be stored to file that is specified in this parameter.

        """
        figure = plt.figure();

        def init_frame():
            return frame_generation(0);

        def frame_generation(index_iteration):
            figure.clf();

            figure.suptitle("K-Means algorithm (iteration: " + str(index_iteration) + ")", fontsize=18, fontweight='bold');

            clusters = observer.get_clusters(index_iteration);
            centers = observer.get_centers(index_iteration);
            kmeans_visualizer.show_clusters(data, clusters, centers, None, figure=figure, display=False);

            figure.subplots_adjust(top=0.85);

            return [figure.gca()];

        iterations = len(observer);
        cluster_animation = animation.FuncAnimation(figure, frame_generation, iterations, interval=animation_velocity,
                                                    init_func=init_frame, repeat_delay=5000);

        if save_movie is not None:
            cluster_animation.save(save_movie, writer='ffmpeg', fps=movie_fps, bitrate=3000);
        else:
            plt.show();



class kmeans:
    """!
    @brief Class represents K-Means clustering algorithm.
    @details CCORE option can be used to use the pyclustering core - C/C++ shared library for processing that significantly increases performance.
    
    CCORE implementation of the algorithm uses thread pool to parallelize the clustering process.
    
    K-Means clustering results depend on initial centers. Algorithm K-Means++ can used for initialization 
    initial centers from module 'pyclustering.cluster.center_initializer'.
    
    @image html kmeans_example_clustering.png "K-Means clustering results. At the left - 'Simple03.data' sample, at the right - 'Lsun.data' sample."
    
    Example #1 - Trivial clustering:
    @code
        # load list of points for cluster analysis
        sample = read_sample(path);
        
        # create instance of K-Means algorithm
        kmeans_instance = kmeans(sample, [ [0.0, 0.1], [2.5, 2.6] ]);
        
        # run cluster analysis and obtain results
        kmeans_instance.process();
        clusters = kmeans_instance.get_clusters();
    @endcode
    
    Example #2 - Clustering using K-Means++ for center initialization:
    @code
        # load list of points for cluster analysis
        sample = read_sample(SIMPLE_SAMPLES.SAMPLE_SIMPLE2);
        
        # initialize initial centers using K-Means++ method
        initial_centers = kmeans_plusplus_initializer(sample, 3).initialize();
        
        # create instance of K-Means algorithm with prepared centers
        kmeans_instance = kmeans(sample, initial_centers);
        
        # run cluster analysis and obtain results
        kmeans_instance.process();
        clusters = kmeans_instance.get_clusters();
        final_centers = kmeans_instance.get_centers();
    @endcode
    
    @see center_initializer
    
    """
    
    def __init__(self, data, initial_centers, tolerance = 0.001, ccore = True, **kwargs):
        """!
        @brief Constructor of clustering algorithm K-Means.
        @details Center initializer can be used for creating initial centers, for example, K-Means++ method.
        
        @param[in] data (array_like): Input data that is presented as array of points (objects), each point should be represented by array_like data structure.
        @param[in] initial_centers (array_like): Initial coordinates of centers of clusters that are represented by array_like data structure: [center1, center2, ...].
        @param[in] tolerance (double): Stop condition: if maximum value of change of centers of clusters is less than tolerance then algorithm stops processing.
        @param[in] ccore (bool): Defines should be CCORE library (C++ pyclustering library) used instead of Python code or not.
        @param[in] **kwargs: Arbitrary keyword arguments (available arguments: 'observer').
        
        Keyword Args:
            observer (kmeans_observer): Observer of the algorithm to collect information about clustering process on each iteration.
        
        @see center_initializer
        
        """
        self.__pointer_data = numpy.matrix(data);
        self.__clusters = [];
        self.__centers = numpy.matrix(initial_centers);
        self.__tolerance = tolerance;
        
        self.__observer = None;
        if 'observer' in kwargs:
            self.__observer = kwargs['observer'];
        
        self.__ccore = ccore;
        if self.__ccore is True:
            self.__ccore = ccore_library.workable();


    def process(self):
        """!
        @brief Performs cluster analysis in line with rules of K-Means algorithm.
        
        @remark Results of clustering can be obtained using corresponding get methods.
        
        @see get_clusters()
        @see get_centers()
        
        """

        if len(self.__pointer_data[0]) != len(self.__centers[0]):
            raise NameError('Dimension of the input data and dimension of the initial cluster centers must be equal.');

        if self.__ccore is True:
            self.__process_by_ccore();
        else:
            self.__process_by_python();


    def __process_by_ccore(self):
        """!
        @brief Performs cluster analysis using CCORE (C/C++ part of pyclustering library).

        """
        results = wrapper.kmeans(self.__pointer_data, self.__centers, self.__tolerance, (self.__observer is not None));
        self.__clusters = results[0];
        self.__centers = results[1];

        if self.__observer is not None:
            self.__observer.set_evolution_clusters(results[2]);
            self.__observer.set_evolution_centers(results[3]);


    def __process_by_python(self):
        """!
        @brief Performs cluster analysis using python code.

        """

        maximum_change = float('inf');
        stop_condition = self.__tolerance * self.__tolerance;

        if self.__observer is not None:
            initial_clusters = self.__update_clusters();
            self.__observer.notify(initial_clusters, self.__centers.tolist());

        while maximum_change > stop_condition:
            self.__clusters = self.__update_clusters();
            updated_centers = self.__update_centers();  # changes should be calculated before assignment

            if self.__observer is not None:
                self.__observer.notify(self.__clusters, updated_centers.tolist());

            if len(self.__centers) != len(updated_centers):
                maximum_change = float('inf');

            else:
                changes = numpy.sum(numpy.square(self.__centers - updated_centers), axis=1);
                maximum_change = numpy.max(changes);

            self.__centers = updated_centers.tolist();


    def get_clusters(self):
        """!
        @brief Returns list of allocated clusters, each cluster contains indexes of objects in list of data.
        
        @see process()
        @see get_centers()
        
        """
        
        return self.__clusters;
    
    
    def get_centers(self):
        """!
        @brief Returns list of centers of allocated clusters.
        
        @see process()
        @see get_clusters()
        
        """

        if isinstance(self.__centers, list):
            return self.__centers;
        
        return self.__centers.tolist();


    def get_cluster_encoding(self):
        """!
        @brief Returns clustering result representation type that indicate how clusters are encoded.
        
        @return (type_encoding) Clustering result representation.
        
        @see get_clusters()
        
        """
        
        return type_encoding.CLUSTER_INDEX_LIST_SEPARATION;


    def __update_clusters(self):
        """!
        @brief Calculate Euclidean distance to each point from the each cluster. Nearest points are captured by according clusters and as a result clusters are updated.
        
        @return (list) updated clusters as list of clusters. Each cluster contains indexes of objects from data.
        
        """
        
        clusters = [[] for _ in range(len(self.__centers))];
        
        dataset_differences = numpy.zeros((len(clusters), len(self.__pointer_data)));
        for index_center in range(len(self.__centers)):
            dataset_differences[index_center] = numpy.sum(numpy.square(self.__pointer_data - self.__centers[index_center]), axis=1).T;
        
        optimum_indexes = numpy.argmin(dataset_differences, axis=0);
        for index_point in range(len(optimum_indexes)):
            index_cluster = optimum_indexes[index_point];
            clusters[index_cluster].append(index_point);
        
        clusters = [cluster for cluster in clusters if len(cluster) > 0];
        
        return clusters;
    
    
    def __update_centers(self):
        """!
        @brief Calculate centers of clusters in line with contained objects.
        
        @return (numpy.matrix) Updated centers as list of centers.
        
        """
        
        dimension = self.__pointer_data.shape[1];
        centers = numpy.zeros((len(self.__clusters), dimension));
        
        for index in range(len(self.__clusters)):
            cluster_points = self.__pointer_data[self.__clusters[index], :];
            centers[index] = cluster_points.mean(axis=0);

        return numpy.matrix(centers);


Push — master ( 0ab444...af3192 )

kmeans_visualizer A

Complexity

Size/Duplication

Importance

8 Methods

How to fix Duplicated Code

Duplicated Code

1. Missing Dependencies

2. Missing init.py files

1. Missing Dependencies

2. Missing init.py files

1. Missing Dependencies

2. Missing init.py files

1		"""!
2
3		@brief Cluster analysis algorithm: K-Means
4		@details Based on book description:
5		- J.B.MacQueen. Some Methods for Classification and Analysis of Multivariate Observations. 1967.
6
7		@authors Andrei Novikov ([email protected])
8		@date 2014-2018
9		@copyright GNU Public License
10
11		@cond GNU_PUBLIC_LICENSE
12		PyClustering is free software: you can redistribute it and/or modify
13		it under the terms of the GNU General Public License as published by
14		the Free Software Foundation, either version 3 of the License, or
15		(at your option) any later version.
16
17		PyClustering is distributed in the hope that it will be useful,
18		but WITHOUT ANY WARRANTY; without even the implied warranty of
19		MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
20		GNU General Public License for more details.
21
22		You should have received a copy of the GNU General Public License
23		along with this program. If not, see <http://www.gnu.org/licenses/>.
24		@endcond
25
26		"""
27
28
29		import numpy;
		0 ignored issues – show Configuration introduced 2018-02-27 17:16 UTC by Report Bug Copy Issue Report The import `numpy` could not be resolved. This can be caused by one of the following: 1. Missing Dependencies This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands. # .scrutinizer.yml before_commands: - sudo pip install abc # Python2 - sudo pip3 install abc # Python3 Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version. 2. Missing __init__.py files This error could also result from missing `__init__.py` files in your module folders. Make sure that you place one file in each sub-folder. Loading history...
30
31		import matplotlib.pyplot as plt;
		0 ignored issues – show Configuration introduced 2018-03-03 12:02 UTC by Report Bug Copy Issue Report The import `matplotlib.pyplot` could not be resolved. This can be caused by one of the following: 1. Missing Dependencies This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands. # .scrutinizer.yml before_commands: - sudo pip install abc # Python2 - sudo pip3 install abc # Python3 Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version. 2. Missing __init__.py files This error could also result from missing `__init__.py` files in your module folders. Make sure that you place one file in each sub-folder. Loading history...
32		import matplotlib.animation as animation;
		0 ignored issues – show Configuration introduced 2018-03-03 12:02 UTC by Report Bug Copy Issue Report The import `matplotlib.animation` could not be resolved. This can be caused by one of the following: 1. Missing Dependencies This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands. # .scrutinizer.yml before_commands: - sudo pip install abc # Python2 - sudo pip3 install abc # Python3 Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version. 2. Missing __init__.py files This error could also result from missing `__init__.py` files in your module folders. Make sure that you place one file in each sub-folder. Loading history...
33
34		import pyclustering.core.kmeans_wrapper as wrapper;
35
36		from pyclustering.core.wrapper import ccore_library;
37
38		from pyclustering.cluster.encoder import type_encoding;
39		from pyclustering.cluster import cluster_visualizer;
40
41
42		class kmeans_observer:
43		"""!
44		@brief Observer of K-Means algorithm that is used to collect information about clustering process on each iteration of the algorithm.
45
46		@see kmeans
47
48		"""
49
50		def __init__(self):
51		"""!
52		@brief Initializer of observer of K-Means algorithm.
53
54		"""
55		self.__evolution_clusters = [];
56		self.__evolution_centers = [];
57		self.__initial_centers = [];
58
59
60		def __len__(self):
61		"""!
62		@brief Returns amount of steps that were observer during clustering process in K-Means algorithm.
63
64		"""
65		return len(self.__evolution_clusters);
66
67
68		def notify(self, clusters, centers):
69		"""!
70		@brief This method is called by K-Means algorithm to notify about changes.
71
72		@param[in] clusters (array_like): Allocated clusters by K-Means algorithm.
73		@param[in] centers (array_like): Allocated centers by K-Means algorithm.
74
75		"""
76		self.__evolution_clusters.append(clusters);
77		self.__evolution_centers.append(centers);
78
79
80		def set_evolution_centers(self, evolution_centers):
81		"""!
82		@brief Set evolution of changes of centers during clustering process.
83
84		@param[in] evolution_centers (array_like): Evolution of changes of centers during clustering process.
85
86		"""
87		self.__evolution_centers = evolution_centers;
88
89
90		def get_centers(self, iteration):
91		"""!
92		@brief Get method to return centers at specific iteration of clustering process.
93
94		@param[in] iteration (uint): Clustering process iteration at which centers are required.
95
96		@return (array_like) Centers at specific iteration.
97
98		"""
99		return self.__evolution_centers[iteration];
100
101
102		def set_evolution_clusters(self, evolution_clusters):
103		"""!
104		@brief Set evolution of changes of centers during clustering process.
105
106		@param[in] evolution_clusters (array_like): Evolution of changes of clusters during clustering process.
107
108		"""
109		self.__evolution_clusters = evolution_clusters;
110
111
112		def get_clusters(self, iteration):
113		"""!
114		@brief Get method to return allocated clusters at specific iteration of clustering process.
115
116		@param[in] iteration (uint): Clustering process iteration at which clusters are required.
117
118		@return (array_like) Clusters at specific iteration.
119
120		"""
121		return self.__evolution_clusters[iteration];
122
123
124
125		class kmeans_visualizer:
126		"""!
127		@brief Visualizer of K-Means algorithm's results.
128		@details K-Means visualizer provides visualization services that are specific for K-Means algorithm.
129
130		"""
131
132		__default_2d_marker_size = 15;
133		__default_3d_marker_size = 70;
134
135
136		@staticmethod
137		def show_clusters(sample, clusters, centers, initial_centers = None, **kwargs):
138		"""!
139		@brief Display K-Means clustering results.
140
141		@param[in] sample (list): Dataset that was used for clustering.
142		@param[in] clusters (array_like): Clusters that were allocated by the algorithm.
143		@param[in] centers (array_like): Centers that were allocated by the algorithm.
144		@param[in] initial_centers (array_like): Initial centers that were used by the algorithm, if 'None' then initial centers are not displyed.
145		@param[in] **kwargs: Arbitrary keyword arguments (available arguments: 'figure', 'display').
146
147		Keyword Args:
148		figure (figure): If 'None' then new is figure is created, otherwise specified figure is used for visualization.
149		display (bool): If 'True' then figure will be shown by the method, otherwise it should be shown manually using matplotlib function 'plt.show()'.
150		offset (uint): Specify axes index on the figure where results should be drawn (only if argument 'figure' is specified).
151
152		@return (figure) Figure where clusters were drawn.
153
154		"""
155
156		visualizer = cluster_visualizer();
157		visualizer.append_clusters(clusters, sample);
158
159		offset = kwargs.get('offset', 0);
160		figure = kwargs.get('figure', None);
161		display = kwargs.get('display', True);
162
163		if (figure is None):
164		figure = visualizer.show(display = False);
165		else:
166		visualizer.show(figure = figure, display = False);
167
168		kmeans_visualizer.__draw_centers(figure, offset, visualizer, centers, initial_centers);
169		kmeans_visualizer.__draw_rays(figure, offset, visualizer, sample, clusters, centers);
170
171		if (display is True):
172		plt.show();
173
174		return figure;
175
176
177		@staticmethod
178		def __draw_rays(figure, offset, visualizer, sample, clusters, centers):
179		ax = figure.get_axes()[offset];
180
181		for index_cluster in range(len(clusters)):
182		color = visualizer.get_cluster_color(index_cluster, 0);
183		kmeans_visualizer.__draw_cluster_rays(ax, color, sample, clusters[index_cluster], centers[index_cluster])
184
185
186		@staticmethod
187		def __draw_cluster_rays(ax, color, sample, cluster, center):
188		dimension = len(sample[0]);
189
190		for index_point in cluster:
191		point = sample[index_point];
192		if dimension == 1:
193		ax.plot([point[0], center[0]], [0.0, 0.0], '-', color=color, linewidth=0.5);
194		elif dimension == 2:
195		ax.plot([point[0], center[0]], [point[1], center[1]], '-', color=color, linewidth=0.5);
196		elif dimension == 3:
197		ax.plot([point[0], center[0]], [point[1], center[1]], [point[2], center[2]], '-', color=color, linewidth=0.5)
198
199
200		@staticmethod
201		def __draw_center(ax, center, color, marker, alpha):
202		dimension = len(center);
203
204		if dimension == 1:
205		ax.plot(center[0], 0.0, color=color, alpha=alpha, marker=marker, markersize=kmeans_visualizer.__default_2d_marker_size);
206		elif dimension == 2:
207		ax.plot(center[0], center[1], color=color, alpha=alpha, marker=marker, markersize=kmeans_visualizer.__default_2d_marker_size);
208		elif dimension == 3:
209		ax.scatter(center[0], center[1], center[2], c=color, alpha=alpha, marker=marker, s=kmeans_visualizer.__default_3d_marker_size);
210
211
212		@staticmethod
213		def __draw_centers(figure, offset, visualizer, centers, initial_centers):
214		ax = figure.get_axes()[offset];
215
216		for index_center in range(len(centers)):
217		color = visualizer.get_cluster_color(index_center, 0);
218		kmeans_visualizer.__draw_center(ax, centers[index_center], color, '*', 1.0);
219
220		if initial_centers is not None:
221		kmeans_visualizer.__draw_center(ax, initial_centers[index_center], color, '*', 0.4);
222
223
224		@staticmethod
225		def animate_cluster_allocation(data, observer, animation_velocity = 500, movie_fps = 1, save_movie = None):
226		"""!
227		@brief Animates clustering process that is performed by K-Means algorithm.
228
229		@param[in] data (list): Dataset that is used for clustering.
230	View Code Duplication	@param[in] observer (kmeans_observer): EM observer that was used for collection information about clustering process.
		0 ignored issues – show Duplication introduced 2018-04-18 21:52 UTC by Report Bug Copy Issue Report This code seems to be duplicated in your project. Loading history...
231		@param[in] animation_velocity (uint): Interval between frames in milliseconds (for run-time animation only).
232		@param[in] movie_fps (uint): Defines frames per second (for rendering movie only).
233		@param[in] save_movie (string): If it is specified then animation will be stored to file that is specified in this parameter.
234
235		"""
236		figure = plt.figure();
237
238		def init_frame():
239		return frame_generation(0);
240
241		def frame_generation(index_iteration):
242		figure.clf();
243
244		figure.suptitle("K-Means algorithm (iteration: " + str(index_iteration) + ")", fontsize=18, fontweight='bold');
245
246		clusters = observer.get_clusters(index_iteration);
247		centers = observer.get_centers(index_iteration);
248		kmeans_visualizer.show_clusters(data, clusters, centers, None, figure=figure, display=False);
249
250		figure.subplots_adjust(top=0.85);
251
252		return [figure.gca()];
253
254		iterations = len(observer);
255		cluster_animation = animation.FuncAnimation(figure, frame_generation, iterations, interval=animation_velocity,
256		init_func=init_frame, repeat_delay=5000);
257
258		if save_movie is not None:
259		cluster_animation.save(save_movie, writer='ffmpeg', fps=movie_fps, bitrate=3000);
260		else:
261		plt.show();
262
263
264
265		class kmeans:
266		"""!
267		@brief Class represents K-Means clustering algorithm.
268		@details CCORE option can be used to use the pyclustering core - C/C++ shared library for processing that significantly increases performance.
269
270		CCORE implementation of the algorithm uses thread pool to parallelize the clustering process.
271
272		K-Means clustering results depend on initial centers. Algorithm K-Means++ can used for initialization
273		initial centers from module 'pyclustering.cluster.center_initializer'.
274
275		@image html kmeans_example_clustering.png "K-Means clustering results. At the left - 'Simple03.data' sample, at the right - 'Lsun.data' sample."
276
277		Example #1 - Trivial clustering:
278		@code
279		# load list of points for cluster analysis
280		sample = read_sample(path);
281
282		# create instance of K-Means algorithm
283		kmeans_instance = kmeans(sample, [ [0.0, 0.1], [2.5, 2.6] ]);
284
285		# run cluster analysis and obtain results
286		kmeans_instance.process();
287		clusters = kmeans_instance.get_clusters();
288		@endcode
289
290		Example #2 - Clustering using K-Means++ for center initialization:
291		@code
292		# load list of points for cluster analysis
293		sample = read_sample(SIMPLE_SAMPLES.SAMPLE_SIMPLE2);
294
295		# initialize initial centers using K-Means++ method
296		initial_centers = kmeans_plusplus_initializer(sample, 3).initialize();
297
298		# create instance of K-Means algorithm with prepared centers
299		kmeans_instance = kmeans(sample, initial_centers);
300
301		# run cluster analysis and obtain results
302		kmeans_instance.process();
303		clusters = kmeans_instance.get_clusters();
304		final_centers = kmeans_instance.get_centers();
305		@endcode
306
307		@see center_initializer
308
309		"""
310
311		def __init__(self, data, initial_centers, tolerance = 0.001, ccore = True, **kwargs):
312		"""!
313		@brief Constructor of clustering algorithm K-Means.
314		@details Center initializer can be used for creating initial centers, for example, K-Means++ method.
315
316		@param[in] data (array_like): Input data that is presented as array of points (objects), each point should be represented by array_like data structure.
317		@param[in] initial_centers (array_like): Initial coordinates of centers of clusters that are represented by array_like data structure: [center1, center2, ...].
318		@param[in] tolerance (double): Stop condition: if maximum value of change of centers of clusters is less than tolerance then algorithm stops processing.
319		@param[in] ccore (bool): Defines should be CCORE library (C++ pyclustering library) used instead of Python code or not.
320		@param[in] **kwargs: Arbitrary keyword arguments (available arguments: 'observer').
321
322		Keyword Args:
323		observer (kmeans_observer): Observer of the algorithm to collect information about clustering process on each iteration.
324
325		@see center_initializer
326
327		"""
328		self.__pointer_data = numpy.matrix(data);
329		self.__clusters = [];
330		self.__centers = numpy.matrix(initial_centers);
331		self.__tolerance = tolerance;
332
333		self.__observer = None;
334		if 'observer' in kwargs:
335		self.__observer = kwargs['observer'];
336
337		self.__ccore = ccore;
338		if self.__ccore is True:
339		self.__ccore = ccore_library.workable();
340
341
342		def process(self):
343		"""!
344		@brief Performs cluster analysis in line with rules of K-Means algorithm.
345
346		@remark Results of clustering can be obtained using corresponding get methods.
347
348		@see get_clusters()
349		@see get_centers()
350
351		"""
352
353		if len(self.__pointer_data[0]) != len(self.__centers[0]):
354		raise NameError('Dimension of the input data and dimension of the initial cluster centers must be equal.');
355
356		if self.__ccore is True:
357		self.__process_by_ccore();
358		else:
359		self.__process_by_python();
360
361
362		def __process_by_ccore(self):
363		"""!
364		@brief Performs cluster analysis using CCORE (C/C++ part of pyclustering library).
365
366		"""
367		results = wrapper.kmeans(self.__pointer_data, self.__centers, self.__tolerance, (self.__observer is not None));
368		self.__clusters = results[0];
369		self.__centers = results[1];
370
371		if self.__observer is not None:
372		self.__observer.set_evolution_clusters(results[2]);
373		self.__observer.set_evolution_centers(results[3]);
374
375
376		def __process_by_python(self):
377		"""!
378		@brief Performs cluster analysis using python code.
379
380		"""
381
382		maximum_change = float('inf');
383		stop_condition = self.__tolerance * self.__tolerance;
384
385		if self.__observer is not None:
386		initial_clusters = self.__update_clusters();
387		self.__observer.notify(initial_clusters, self.__centers.tolist());
388
389		while maximum_change > stop_condition:
390		self.__clusters = self.__update_clusters();
391		updated_centers = self.__update_centers(); # changes should be calculated before assignment
392
393		if self.__observer is not None:
394		self.__observer.notify(self.__clusters, updated_centers.tolist());
395
396		if len(self.__centers) != len(updated_centers):
397		maximum_change = float('inf');
398
399		else:
400		changes = numpy.sum(numpy.square(self.__centers - updated_centers), axis=1);
401		maximum_change = numpy.max(changes);
402
403		self.__centers = updated_centers.tolist();
404
405
406		def get_clusters(self):
407		"""!
408		@brief Returns list of allocated clusters, each cluster contains indexes of objects in list of data.
409
410		@see process()
411		@see get_centers()
412
413		"""
414
415		return self.__clusters;
416
417
418		def get_centers(self):
419		"""!
420		@brief Returns list of centers of allocated clusters.
421
422		@see process()
423		@see get_clusters()
424
425		"""
426
427		if isinstance(self.__centers, list):
428		return self.__centers;
429
430		return self.__centers.tolist();
431
432
433		def get_cluster_encoding(self):
434		"""!
435		@brief Returns clustering result representation type that indicate how clusters are encoded.
436
437		@return (type_encoding) Clustering result representation.
438
439		@see get_clusters()
440
441		"""
442
443		return type_encoding.CLUSTER_INDEX_LIST_SEPARATION;
444
445
446		def __update_clusters(self):
447		"""!
448		@brief Calculate Euclidean distance to each point from the each cluster. Nearest points are captured by according clusters and as a result clusters are updated.
449
450		@return (list) updated clusters as list of clusters. Each cluster contains indexes of objects from data.
451
452		"""
453
454		clusters = [[] for _ in range(len(self.__centers))];
455
456		dataset_differences = numpy.zeros((len(clusters), len(self.__pointer_data)));
457		for index_center in range(len(self.__centers)):
458		dataset_differences[index_center] = numpy.sum(numpy.square(self.__pointer_data - self.__centers[index_center]), axis=1).T;
459
460		optimum_indexes = numpy.argmin(dataset_differences, axis=0);
461		for index_point in range(len(optimum_indexes)):
462		index_cluster = optimum_indexes[index_point];
463		clusters[index_cluster].append(index_point);
464
465		clusters = [cluster for cluster in clusters if len(cluster) > 0];
466
467		return clusters;
468
469
470		def __update_centers(self):
471		"""!
472		@brief Calculate centers of clusters in line with contained objects.
473
474		@return (numpy.matrix) Updated centers as list of centers.
475
476		"""
477
478		dimension = self.__pointer_data.shape[1];
479		centers = numpy.zeros((len(self.__clusters), dimension));
480
481		for index in range(len(self.__clusters)):
482		cluster_points = self.__pointer_data[self.__clusters[index], :];
483		centers[index] = cluster_points.mean(axis=0);
484
485		return numpy.matrix(centers);
486

annoviko / pyclustering

Push — master ( 0ab444...af3192 )

kmeans_visualizer A

Complexity

Size/Duplication

Importance

8 Methods

How to fix Duplicated Code

Duplicated Code

1. Missing Dependencies

2. Missing __init__.py files

1. Missing Dependencies

2. Missing __init__.py files

1. Missing Dependencies

2. Missing __init__.py files

Duplication Side-by-Side

Filter issues like

2. Missing init.py files

2. Missing init.py files

2. Missing init.py files