Duplicate code is one of the most pungent code smells. A rule that is often used is to re-structure code once it is duplicated in three or more places.
Common duplication problems, and corresponding solutions are:
1 | <?php |
||
10 | class LDA extends EigenTransformerBase |
||
11 | { |
||
12 | /** |
||
13 | * @var bool |
||
14 | */ |
||
15 | public $fit = false; |
||
16 | |||
17 | /** |
||
18 | * @var array |
||
19 | */ |
||
20 | public $labels; |
||
21 | |||
22 | /** |
||
23 | * @var array |
||
24 | */ |
||
25 | public $means; |
||
26 | |||
27 | /** |
||
28 | * @var array |
||
29 | */ |
||
30 | public $counts; |
||
31 | |||
32 | /** |
||
33 | * @var float |
||
34 | */ |
||
35 | public $overallMean; |
||
36 | |||
37 | /** |
||
38 | * Linear Discriminant Analysis (LDA) is used to reduce the dimensionality |
||
39 | * of the data. Unlike Principal Component Analysis (PCA), it is a supervised |
||
40 | * technique that requires the class labels in order to fit the data to a |
||
41 | * lower dimensional space. <br><br> |
||
42 | * The algorithm can be initialized by speciyfing |
||
43 | * either with the totalVariance(a value between 0.1 and 0.99) |
||
44 | * or numFeatures (number of features in the dataset) to be preserved. |
||
45 | * |
||
46 | * @param float|null $totalVariance Total explained variance to be preserved |
||
47 | * @param int|null $numFeatures Number of features to be preserved |
||
48 | * |
||
49 | * @throws \Exception |
||
50 | */ |
||
51 | View Code Duplication | public function __construct($totalVariance = null, $numFeatures = null) |
|
70 | |||
71 | /** |
||
72 | * Trains the algorithm to transform the given data to a lower dimensional space. |
||
73 | * |
||
74 | * @param array $data |
||
75 | * @param array $classes |
||
76 | * |
||
77 | * @return array |
||
78 | */ |
||
79 | public function fit(array $data, array $classes) : array |
||
94 | |||
95 | /** |
||
96 | * Returns unique labels in the dataset |
||
97 | * |
||
98 | * @param array $classes |
||
99 | * |
||
100 | * @return array |
||
101 | */ |
||
102 | protected function getLabels(array $classes): array |
||
108 | |||
109 | |||
110 | /** |
||
111 | * Calculates mean of each column for each class and returns |
||
112 | * n by m matrix where n is number of labels and m is number of columns |
||
113 | * |
||
114 | * @param type $data |
||
115 | * @param type $classes |
||
116 | * |
||
117 | * @return array |
||
118 | */ |
||
119 | protected function calculateMeans($data, $classes) : array |
||
158 | |||
159 | |||
160 | /** |
||
161 | * Returns in-class scatter matrix for each class, which |
||
162 | * is a n by m matrix where n is number of classes and |
||
163 | * m is number of columns |
||
164 | * |
||
165 | * @param array $data |
||
166 | * @param array $classes |
||
167 | * |
||
168 | * @return Matrix |
||
169 | */ |
||
170 | protected function calculateClassVar($data, $classes) |
||
187 | |||
188 | /** |
||
189 | * Returns between-class scatter matrix for each class, which |
||
190 | * is an n by m matrix where n is number of classes and |
||
191 | * m is number of columns |
||
192 | * |
||
193 | * @return Matrix |
||
194 | */ |
||
195 | protected function calculateClassCov() |
||
209 | |||
210 | /** |
||
211 | * Returns the result of the calculation (x - m)T.(x - m) |
||
212 | * |
||
213 | * @param array $row |
||
214 | * @param array $means |
||
215 | * |
||
216 | * @return Matrix |
||
217 | */ |
||
218 | protected function calculateVar(array $row, array $means) |
||
226 | |||
227 | /** |
||
228 | * Transforms the given sample to a lower dimensional vector by using |
||
229 | * the eigenVectors obtained in the last run of <code>fit</code>. |
||
230 | * |
||
231 | * @param array $sample |
||
232 | * |
||
233 | * @return array |
||
234 | */ |
||
235 | View Code Duplication | public function transform(array $sample) |
|
247 | } |
||
248 |
Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.
You can also find more detailed suggestions in the “Code” section of your repository.