Completed
Push — 0.7.dev ( bfeadc...20409b )
by Andrei
01:28
created

ema   B

Complexity

Total Complexity 40

Size/Duplication

Total Lines 157
Duplicated Lines 0 %

Importance

Changes 0
Metric Value
dl 0
loc 157
rs 8.2608
c 0
b 0
f 0
wmc 40

15 Methods

Rating   Name   Duplication   Size   Complexity  
A get_centers() 0 2 1
A __update_mean() 0 7 2
A process() 0 15 4
A get_clusters() 0 2 1
B __init__() 0 17 5
A __get_random_means() 0 12 3
A __expectation_step() 0 7 4
A __maximization_step() 0 8 2
A __extract_clusters() 0 11 4
A __probabilities() 0 7 2
A get_covariances() 0 2 1
A __log_likelihood() 0 11 3
A __update_covariance() 0 8 2
A __get_stop_flag() 0 6 3
A __get_random_covariances() 0 13 3

How to fix   Complexity   

Complex Class

Complex classes like ema often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

1
"""!
2
3
@brief Cluster analysis algorithm: Expectation-Maximization Algorithm (EMA).
4
@details Implementation based on article:
5
         - 
6
7
@authors Andrei Novikov ([email protected])
8
@date 2014-2017
9
@copyright GNU Public License
10
11
@cond GNU_PUBLIC_LICENSE
12
    PyClustering is free software: you can redistribute it and/or modify
13
    it under the terms of the GNU General Public License as published by
14
    the Free Software Foundation, either version 3 of the License, or
15
    (at your option) any later version.
16
    
17
    PyClustering is distributed in the hope that it will be useful,
18
    but WITHOUT ANY WARRANTY; without even the implied warranty of
19
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
20
    GNU General Public License for more details.
21
    
22
    You should have received a copy of the GNU General Public License
23
    along with this program.  If not, see <http://www.gnu.org/licenses/>.
24
@endcond
25
26
"""
27
28
29
import numpy;
0 ignored issues
show
Configuration introduced by
The import numpy could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
30
31
from pyclustering.cluster import cluster_visualizer;
32
from pyclustering.utils import pi;
33
34
import matplotlib.pyplot as plt;
0 ignored issues
show
Configuration introduced by
The import matplotlib.pyplot could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
Unused Code introduced by
Unused matplotlib.pyplot imported as plt
Loading history...
35
from matplotlib.patches import Ellipse;
0 ignored issues
show
Configuration introduced by
The import matplotlib.patches could not be resolved.

This can be caused by one of the following:

1. Missing Dependencies

This error could indicate a configuration issue of Pylint. Make sure that your libraries are available by adding the necessary commands.

# .scrutinizer.yml
before_commands:
    - sudo pip install abc # Python2
    - sudo pip3 install abc # Python3
Tip: We are currently not using virtualenv to run pylint, when installing your modules make sure to use the command for the correct version.

2. Missing __init__.py files

This error could also result from missing __init__.py files in your module folders. Make sure that you place one file in each sub-folder.

Loading history...
Unused Code introduced by
Unused Ellipse imported from matplotlib.patches
Loading history...
36
from _operator import index
0 ignored issues
show
Unused Code introduced by
Unused index imported from _operator
Loading history...
37
38
39
40
def gaussian(data, mean, covariance):
41
    dimension = len(data[0]);
42
 
43
    if (dimension != 1):
44
        inv_variance = numpy.linalg.inv(covariance);
45
    else:
46
        inv_variance = 1.0 / covariance;
47
    
48
    right_const = 1.0 / ( (pi * 2.0) ** (dimension / 2.0) * numpy.linalg.norm(covariance) ** 0.5 );
49
     
50
    result = [];
51
     
52
    for point in data:
53
        mean_delta = point - mean;
54
        point_gaussian = right_const * numpy.exp( -0.5 * mean_delta.dot(inv_variance).dot(numpy.transpose(mean_delta)) );
55
        result.append(point_gaussian);
56
     
57
    return result;
58
59
60
61
class ema_observer:
62
    def __init__(self):
63
        self.__means = [];
64
        self.__covariances = [];
65
66
    def get_iterations(self):
67
        return len(self.__means);
68
69
    def get_means(self):
70
        return self.__means;
71
72
    def get_covariances(self):
73
        return self.__covariances;
74
75
    def notify(self, means, covariances):
76
        self.__means.append(means);
77
        self.__covariances.append(covariances);
78
79
80
81
class ema_visualizer:
82
    @staticmethod
83
    def show_clusters(self, clusters, sample, covariances):
0 ignored issues
show
Best Practice introduced by
Static method with 'self' as first argument
Loading history...
Unused Code introduced by
The argument covariances seems to be unused.
Loading history...
Unused Code introduced by
The argument self seems to be unused.
Loading history...
84
        visualizer = cluster_visualizer();
85
        visualizer.append_clusters(clusters, sample);
86
        figure = visualizer.show(display = False);
0 ignored issues
show
Unused Code introduced by
The variable figure seems to be unused.
Loading history...
87
        
88
        # TODO: draw ellipes for each cluster using covariance matrix
89
90
91
class ema:
92
    def __init__(self, data, amount_clusters, means = None, variances = None):
93
        self.__data = numpy.array(data);
94
        self.__amount_clusters = amount_clusters;
95
        
96
        self.__means = means;
97
        if (means is None):
98
            self.__means = self.__get_random_means(data, amount_clusters);
99
100
        self.__variances = variances;
101
        if (variances is None):
102
            self.__variances = self.__get_random_covariances(data, amount_clusters);
103
        
104
        self.__rc = [ [0.0] * len(self.__data) for _ in range(amount_clusters) ];
105
        self.__pic = [1.0] * amount_clusters;
106
        self.__clusters = [];
107
        self.__gaussians = [ [] for _ in range(amount_clusters) ];
108
        self.__stop = False;
109
110
111
    def process(self):
112
        self.__clusters = None;
113
        
114
        previous_likelihood = -10000500;
115
        current_likelihood = -10000000;
116
        
117
        while((self.__stop is False) and (abs(numpy.min(previous_likelihood) - numpy.min(current_likelihood)) > 0.00001) and (current_likelihood < 0.0)):
118
            self.__expectation_step();
119
            self.__maximization_step();
120
            
121
            previous_likelihood = current_likelihood;
122
            current_likelihood = self.__log_likelihood();
123
            self.__stop = self.__get_stop_flag();
124
        
125
        self.__extract_clusters();
126
127
128
    def get_clusters(self):
129
        return self.__clusters;
130
131
132
    def get_centers(self):
133
        return self.__means;
134
135
136
    def get_covariances(self):
137
        return self.__variances;
138
139
140
    def __extract_clusters(self):
141
        self.__clusters = [];
142
        for index_cluster in range(self.__amount_clusters):
143
            cluster = [];
144
            for index_point in range(len(self.__data)):
145
                if (self.__rc[index_cluster][index_point] >= 0.5):
146
                    cluster.append(index_point);
147
            
148
            self.__clusters.append(cluster);
149
        
150
        return self.__clusters;
151
152
153
    def __log_likelihood(self):
154
        likelihood = 0.0;
155
        
156
        for index_point in range(len(self.__data)):
157
            particle = 0.0;
158
            for index_cluster in range(self.__amount_clusters):
159
                particle += self.__pic[index_cluster] * self.__gaussians[index_cluster][index_point];
160
            
161
            likelihood += numpy.log(particle);
162
        
163
        return likelihood;
164
165
166
    def __probabilities(self, index_cluster, index_point):
167
        divider = 0.0;
168
        for i in range(self.__amount_clusters):
169
            divider += self.__pic[i] * self.__gaussians[i][index_point];
170
        
171
        rc = self.__pic[index_cluster] * self.__gaussians[index_cluster][index_point] / divider;
172
        return rc;
173
174
175
    def __expectation_step(self):
176
        for index in range(self.__amount_clusters):
0 ignored issues
show
Comprehensibility Bug introduced by
index is re-defining a name which is already available in the outer-scope (previously defined on line 36).

It is generally a bad practice to shadow variables from the outer-scope. In most cases, this is done unintentionally and might lead to unexpected behavior:

param = 5

class Foo:
    def __init__(self, param):   # "param" would be flagged here
        self.param = param
Loading history...
177
            self.__gaussians[index] = gaussian(self.__data, self.__means[index], self.__variances[index]);
178
        
179
        for index_cluster in range(self.__amount_clusters):
180
            for index_point in range(len(self.__data)):
181
                self.__rc[index_cluster][index_point] = self.__probabilities(index_cluster, index_point);
182
183
184
    def __maximization_step(self):
185
        for index_cluster in range(self.__amount_clusters):
186
            mc = numpy.sum(self.__rc[index_cluster]);
187
            
188
            self.__pic[index_cluster] = mc / len(self.__data);
189
            self.__means[index_cluster] = self.__update_mean(index_cluster, mc);
190
            
191
            self.__variances[index_cluster] = self.__update_covariance(index_cluster, mc);
192
193
194
    def __get_stop_flag(self):
195
        for covariance in self.__variances:
196
            if (min(covariance[0]) == 0):
197
                return True;
198
        
199
        return False;
200
201
202
    def __update_covariance(self, index_cluster, mc):
203
        covariance = 0.0;
204
        for index_point in range(len(self.__data)):
205
            deviation = numpy.array( [ self.__data[index_point] - self.__means[index_cluster] ]);
206
            covariance += self.__rc[index_cluster][index_point] * deviation.T.dot(deviation);
207
        
208
        covariance = covariance / mc;
209
        return covariance;
210
211
212
    def __update_mean(self, index_cluster, mc):
213
        mean = 0.0;
214
        for index_point in range(len(self.__data)):
215
            mean += self.__rc[index_cluster][index_point] * self.__data[index_point];
216
        
217
        mean = mean / mc;
218
        return mean;
219
220
221
    def __get_random_covariances(self, data, amount):
222
        covariances = [];
223
        covariance_appendixes = [];
224
        data_covariance = numpy.cov(data, rowvar = False);
225
        for _ in range(amount):
226
            random_appendix = numpy.min(data_covariance) * 0.5 * numpy.random.random();
227
            while(random_appendix in covariance_appendixes):
228
                random_appendix = numpy.min(data_covariance) * 0.5 * numpy.random.random();
229
            
230
            covariance_appendixes.append(random_appendix)
231
            covariances.append(data_covariance - random_appendix);
232
         
233
        return covariances;
234
235
236
    def __get_random_means(self, data, amount):
237
        means = [];
238
        mean_indexes = [];
239
        for _ in range(amount):
240
            random_index = numpy.random.randint(0, len(data));
241
            while(random_index in mean_indexes):
242
                random_index = numpy.random.randint(0, len(data));
243
            
244
            mean_indexes.append(random_index);
245
            means.append(numpy.array(data[random_index]));
246
        
247
        return means;