Duplicate code is one of the most pungent code smells. A rule that is often used is to re-structure code once it is duplicated in three or more places.
Common duplication problems, and corresponding solutions are:
| 1 | <?php |
||
| 12 | class DecisionStump extends WeightedClassifier |
||
| 13 | { |
||
| 14 | use Trainable, Predictable; |
||
| 15 | |||
| 16 | /** |
||
| 17 | * @var int |
||
| 18 | */ |
||
| 19 | protected $givenColumnIndex; |
||
| 20 | |||
| 21 | |||
| 22 | /** |
||
| 23 | * Sample weights : If used the optimization on the decision value |
||
| 24 | * will take these weights into account. If not given, all samples |
||
| 25 | * will be weighed with the same value of 1 |
||
| 26 | * |
||
| 27 | * @var array |
||
| 28 | */ |
||
| 29 | protected $weights = null; |
||
| 30 | |||
| 31 | /** |
||
| 32 | * Lowest error rate obtained while training/optimizing the model |
||
| 33 | * |
||
| 34 | * @var float |
||
| 35 | */ |
||
| 36 | protected $trainingErrorRate; |
||
| 37 | |||
| 38 | /** |
||
| 39 | * @var int |
||
| 40 | */ |
||
| 41 | protected $column; |
||
| 42 | |||
| 43 | /** |
||
| 44 | * @var mixed |
||
| 45 | */ |
||
| 46 | protected $value; |
||
| 47 | |||
| 48 | /** |
||
| 49 | * @var string |
||
| 50 | */ |
||
| 51 | protected $operator; |
||
| 52 | |||
| 53 | /** |
||
| 54 | * @var array |
||
| 55 | */ |
||
| 56 | protected $columnTypes; |
||
| 57 | |||
| 58 | /** |
||
| 59 | * A DecisionStump classifier is a one-level deep DecisionTree. It is generally |
||
| 60 | * used with ensemble algorithms as in the weak classifier role. <br> |
||
| 61 | * |
||
| 62 | * If columnIndex is given, then the stump tries to produce a decision node |
||
| 63 | * on this column, otherwise in cases given the value of -1, the stump itself |
||
| 64 | * decides which column to take for the decision (Default DecisionTree behaviour) |
||
| 65 | * |
||
| 66 | * @param int $columnIndex |
||
| 67 | */ |
||
| 68 | public function __construct(int $columnIndex = -1) |
||
| 72 | |||
| 73 | /** |
||
| 74 | * @param array $samples |
||
| 75 | * @param array $targets |
||
| 76 | */ |
||
| 77 | public function train(array $samples, array $targets) |
||
| 135 | |||
| 136 | /** |
||
| 137 | * Determines best split point for the given column |
||
| 138 | * |
||
| 139 | * @param int $col |
||
| 140 | * |
||
| 141 | * @return array |
||
| 142 | */ |
||
| 143 | protected function getBestNumericalSplit(int $col) |
||
| 175 | |||
| 176 | /** |
||
| 177 | * |
||
| 178 | * @param int $col |
||
| 179 | * |
||
| 180 | * @return array |
||
| 181 | */ |
||
| 182 | protected function getBestNominalSplit(int $col) |
||
| 203 | |||
| 204 | |||
| 205 | /** |
||
| 206 | * |
||
| 207 | * @param type $lVal |
||
| 208 | * @param type $op |
||
| 209 | * @param type $rVal |
||
| 210 | * |
||
| 211 | * @return boolean |
||
| 212 | */ |
||
| 213 | protected function evaluate($lVal, $op, $rVal) |
||
| 227 | |||
| 228 | /** |
||
| 229 | * Calculates the ratio of wrong predictions based on the new threshold |
||
| 230 | * value given as the parameter |
||
| 231 | * |
||
| 232 | * @param float $threshold |
||
| 233 | * @param string $operator |
||
| 234 | * @param array $values |
||
| 235 | */ |
||
| 236 | protected function calculateErrorRate(float $threshold, string $operator, array $values) |
||
| 256 | |||
| 257 | /** |
||
| 258 | * @param array $sample |
||
| 259 | * @return mixed |
||
| 260 | */ |
||
| 261 | protected function predictSample(array $sample) |
||
| 268 | |||
| 269 | public function __toString() |
||
| 273 | } |
||
| 274 |
In PHP it is possible to write to properties without declaring them. For example, the following is perfectly valid PHP code:
Generally, it is a good practice to explictly declare properties to avoid accidental typos and provide IDE auto-completion: