Complex classes like DecisionTree often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes. You can also have a look at the cohesion graph to spot any un-connected, or weakly-connected components.
Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.
While breaking up the class, it is a good idea to analyze how other classes use DecisionTree, and based on these observations, apply Extract Interface, too.
| 1 | <?php |
||
| 12 | class DecisionTree implements Classifier |
||
| 13 | { |
||
| 14 | use Trainable, Predictable; |
||
| 15 | |||
| 16 | const CONTINUOS = 1; |
||
| 17 | const NOMINAL = 2; |
||
| 18 | |||
| 19 | /** |
||
| 20 | * @var array |
||
| 21 | */ |
||
| 22 | private $samples = []; |
||
| 23 | |||
| 24 | /** |
||
| 25 | * @var array |
||
| 26 | */ |
||
| 27 | private $columnTypes; |
||
| 28 | |||
| 29 | /** |
||
| 30 | * @var array |
||
| 31 | */ |
||
| 32 | private $labels = []; |
||
| 33 | |||
| 34 | /** |
||
| 35 | * @var int |
||
| 36 | */ |
||
| 37 | private $featureCount = 0; |
||
| 38 | |||
| 39 | /** |
||
| 40 | * @var DecisionTreeLeaf |
||
| 41 | */ |
||
| 42 | private $tree = null; |
||
| 43 | |||
| 44 | /** |
||
| 45 | * @var int |
||
| 46 | */ |
||
| 47 | private $maxDepth; |
||
| 48 | |||
| 49 | /** |
||
| 50 | * @var int |
||
| 51 | */ |
||
| 52 | public $actualDepth = 0; |
||
| 53 | |||
| 54 | /** |
||
| 55 | * @var int |
||
| 56 | */ |
||
| 57 | private $numUsableFeatures = 0; |
||
| 58 | |||
| 59 | /** |
||
| 60 | * @param int $maxDepth |
||
| 61 | */ |
||
| 62 | public function __construct($maxDepth = 10) |
||
| 66 | /** |
||
| 67 | * @param array $samples |
||
| 68 | * @param array $targets |
||
| 69 | */ |
||
| 70 | public function train(array $samples, array $targets) |
||
| 80 | |||
| 81 | protected function getColumnTypes(array $samples) |
||
| 91 | |||
| 92 | /** |
||
| 93 | * @param null|array $records |
||
| 94 | * @return DecisionTreeLeaf |
||
| 95 | */ |
||
| 96 | protected function getSplitLeaf($records, $depth = 0) |
||
| 140 | |||
| 141 | /** |
||
| 142 | * @param array $records |
||
| 143 | * @return DecisionTreeLeaf[] |
||
| 144 | */ |
||
| 145 | protected function getBestSplit($records) |
||
| 174 | |||
| 175 | /** |
||
| 176 | * @return array |
||
| 177 | */ |
||
| 178 | protected function getSelectedFeatures() |
||
| 195 | |||
| 196 | /** |
||
| 197 | * @param string $baseValue |
||
| 198 | * @param array $colValues |
||
| 199 | * @param array $targets |
||
| 200 | */ |
||
| 201 | public function getGiniIndex($baseValue, $colValues, $targets) |
||
| 225 | |||
| 226 | /** |
||
| 227 | * @param array $samples |
||
| 228 | * @return array |
||
| 229 | */ |
||
| 230 | protected function preprocess(array $samples) |
||
| 253 | |||
| 254 | /** |
||
| 255 | * @param array $columnValues |
||
| 256 | * @return bool |
||
| 257 | */ |
||
| 258 | protected function isCategoricalColumn(array $columnValues) |
||
| 276 | |||
| 277 | /** |
||
| 278 | * This method is used to set number of columns to be used |
||
| 279 | * when deciding a split at an internal node of the tree. <br> |
||
| 280 | * If the value is given 0, then all features are used (default behaviour), |
||
| 281 | * otherwise the given value will be used as a maximum for number of columns |
||
| 282 | * randomly selected for each split operation. |
||
| 283 | * |
||
| 284 | * @param int $numFeatures |
||
| 285 | * @return $this |
||
| 286 | * @throws Exception |
||
| 287 | */ |
||
| 288 | public function setNumFeatures(int $numFeatures) |
||
| 297 | |||
| 298 | /** |
||
| 299 | * @return string |
||
| 300 | */ |
||
| 301 | public function getHtml() |
||
| 305 | |||
| 306 | /** |
||
| 307 | * @param array $sample |
||
| 308 | * @return mixed |
||
| 309 | */ |
||
| 310 | protected function predictSample(array $sample) |
||
| 326 | } |
||
| 327 |
This check looks at variables that have been passed in as parameters and are passed out again to other methods.
If the outgoing method call has stricter type requirements than the method itself, an issue is raised.
An additional type check may prevent trouble.