Complex classes like DecisionTree often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes. You can also have a look at the cohesion graph to spot any un-connected, or weakly-connected components.
Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.
While breaking up the class, it is a good idea to analyze how other classes use DecisionTree, and based on these observations, apply Extract Interface, too.
1 | <?php |
||
12 | class DecisionTree implements Classifier |
||
13 | { |
||
14 | use Trainable, Predictable; |
||
15 | |||
16 | const CONTINUOS = 1; |
||
17 | const NOMINAL = 2; |
||
18 | |||
19 | /** |
||
20 | * @var array |
||
21 | */ |
||
22 | private $samples = array(); |
||
23 | |||
24 | /** |
||
25 | * @var array |
||
26 | */ |
||
27 | private $columnTypes; |
||
28 | /** |
||
29 | * @var array |
||
30 | */ |
||
31 | private $labels = array(); |
||
32 | /** |
||
33 | * @var int |
||
34 | */ |
||
35 | private $featureCount = 0; |
||
36 | /** |
||
37 | * @var DecisionTreeLeaf |
||
38 | */ |
||
39 | private $tree = null; |
||
40 | |||
41 | /** |
||
42 | * @var int |
||
43 | */ |
||
44 | private $maxDepth; |
||
45 | |||
46 | /** |
||
47 | * @var int |
||
48 | */ |
||
49 | public $actualDepth = 0; |
||
50 | |||
51 | /** |
||
52 | * @param int $maxDepth |
||
53 | */ |
||
54 | public function __construct($maxDepth = 10) |
||
58 | /** |
||
59 | * @param array $samples |
||
60 | * @param array $targets |
||
61 | */ |
||
62 | public function train(array $samples, array $targets) |
||
71 | |||
72 | protected function getColumnTypes(array $samples) |
||
82 | |||
83 | /** |
||
84 | * @param null|array $records |
||
85 | * @return DecisionTreeLeaf |
||
86 | */ |
||
87 | protected function getSplitLeaf($records, $depth = 0) |
||
131 | |||
132 | /** |
||
133 | * @param array $records |
||
134 | * @return DecisionTreeLeaf[] |
||
135 | */ |
||
136 | protected function getBestSplit($records) |
||
165 | |||
166 | /** |
||
167 | * @param string $baseValue |
||
168 | * @param array $colValues |
||
169 | * @param array $targets |
||
170 | */ |
||
171 | public function getGiniIndex($baseValue, $colValues, $targets) |
||
195 | |||
196 | /** |
||
197 | * @param array $samples |
||
198 | * @return array |
||
199 | */ |
||
200 | protected function preprocess(array $samples) |
||
223 | |||
224 | /** |
||
225 | * @param array $columnValues |
||
226 | * @return bool |
||
227 | */ |
||
228 | protected function isCategoricalColumn(array $columnValues) |
||
246 | |||
247 | /** |
||
248 | * @return string |
||
249 | */ |
||
250 | public function getHtml() |
||
254 | |||
255 | /** |
||
256 | * @param array $sample |
||
257 | * @return mixed |
||
258 | */ |
||
259 | protected function predictSample(array $sample) |
||
274 | } |
||
275 |
This check looks at variables that have been passed in as parameters and are passed out again to other methods.
If the outgoing method call has stricter type requirements than the method itself, an issue is raised.
An additional type check may prevent trouble.