Test Failed
Pull Request — master (#53)
by
unknown
02:42
created

DecisionStump::evaluate()   B

Complexity

Conditions 8
Paths 8

Size

Total Lines 14
Code Lines 10

Duplication

Lines 0
Ratio 0 %

Importance

Changes 0
Metric Value
dl 0
loc 14
rs 7.7777
c 0
b 0
f 0
cc 8
eloc 10
nc 8
nop 3
1
<?php
2
3
declare(strict_types=1);
4
5
namespace Phpml\Classification\Linear;
6
7
use Phpml\Helper\Predictable;
8
use Phpml\Helper\Trainable;
9
use Phpml\Classification\WeightedClassifier;
10
use Phpml\Classification\DecisionTree;
11
12
class DecisionStump extends WeightedClassifier
13
{
14
    use Trainable, Predictable;
15
16
    /**
17
     * @var int
18
     */
19
    protected $givenColumnIndex;
20
21
22
    /**
23
     * Sample weights : If used the optimization on the decision value
24
     * will take these weights into account. If not given, all samples
25
     * will be weighed with the same value of 1
26
     *
27
     * @var array
28
     */
29
    protected $weights = null;
30
31
    /**
32
     * Lowest error rate obtained while training/optimizing the model
33
     *
34
     * @var float
35
     */
36
    protected $trainingErrorRate;
37
38
    /**
39
     * @var int
40
     */
41
    protected $column;
42
43
    /**
44
     * @var mixed
45
     */
46
    protected $value;
47
48
    /**
49
     * @var string
50
     */
51
    protected $operator;
52
53
    /**
54
     * @var array
55
     */
56
    protected $columnTypes;
57
58
    /**
59
     * A DecisionStump classifier is a one-level deep DecisionTree. It is generally
60
     * used with ensemble algorithms as in the weak classifier role. <br>
61
     *
62
     * If columnIndex is given, then the stump tries to produce a decision node
63
     * on this column, otherwise in cases given the value of -1, the stump itself
64
     * decides which column to take for the decision (Default DecisionTree behaviour)
65
     *
66
     * @param int $columnIndex
67
     */
68
    public function __construct(int $columnIndex = -1)
69
    {
70
        $this->givenColumnIndex = $columnIndex;
71
    }
72
73
    /**
74
     * @param array $samples
75
     * @param array $targets
76
     */
77
    public function train(array $samples, array $targets)
78
    {
79
        $this->samples = array_merge($this->samples, $samples);
80
        $this->targets = array_merge($this->targets, $targets);
81
82
        // DecisionStump is capable of classifying between two classes only
83
        $labels = array_count_values($this->targets);
84
        $this->labels = array_keys($labels);
0 ignored issues
show
Bug introduced by
The property labels does not exist. Did you maybe forget to declare it?

In PHP it is possible to write to properties without declaring them. For example, the following is perfectly valid PHP code:

class MyClass { }

$x = new MyClass();
$x->foo = true;

Generally, it is a good practice to explictly declare properties to avoid accidental typos and provide IDE auto-completion:

class MyClass {
    public $foo;
}

$x = new MyClass();
$x->foo = true;
Loading history...
85
        if (count($this->labels) != 2) {
86
            throw new \Exception("DecisionStump can classify between two classes only:" . implode(',', $this->labels));
87
        }
88
89
        // If a column index is given, it should be among the existing columns
90
        if ($this->givenColumnIndex > count($samples[0]) - 1) {
91
            $this->givenColumnIndex = -1;
92
        }
93
94
        // Check the size of the weights given.
95
        // If none given, then assign 1 as a weight to each sample
96
        if ($this->weights) {
0 ignored issues
show
Bug Best Practice introduced by
The expression $this->weights of type array is implicitly converted to a boolean; are you sure this is intended? If so, consider using ! empty($expr) instead to make it clear that you intend to check for an array without elements.

This check marks implicit conversions of arrays to boolean values in a comparison. While in PHP an empty array is considered to be equal (but not identical) to false, this is not always apparent.

Consider making the comparison explicit by using empty(..) or ! empty(...) instead.

Loading history...
97
            $numWeights = count($this->weights);
98
            if ($numWeights != count($this->samples)) {
99
                throw new \Exception("Number of sample weights does not match with number of samples");
100
            }
101
        } else {
102
            $this->weights = array_fill(0, count($samples), 1);
103
        }
104
105
        // Determine type of each column as either "continuous" or "nominal"
106
        $this->columnTypes = DecisionTree::getColumnTypes($this->samples);
107
108
        // Try to find the best split in the columns of the dataset
109
        // by calculating error rate for each split point in each column
110
        $columns = range(0, count($samples[0]) - 1);
111
        if ($this->givenColumnIndex != -1) {
112
            $columns = [$this->givenColumnIndex];
113
        }
114
115
        $bestSplit = [
116
            'value' => 0, 'operator' => '',
117
            'column' => 0, 'trainingErrorRate' => 1.0];
118
        foreach ($columns as $col) {
119
            if ($this->columnTypes[$col] == DecisionTree::CONTINUOS) {
120
                $split = $this->getBestNumericalSplit($col);
121
            } else {
122
                $split = $this->getBestNominalSplit($col);
123
            }
124
125
            if ($split['trainingErrorRate'] < $bestSplit['trainingErrorRate']) {
126
                $bestSplit = $split;
127
            }
128
        }
129
130
        // Assign determined best values to the stump
131
        foreach ($bestSplit as $name => $value) {
132
            $this->{$name} = $value;
133
        }
134
    }
135
136
    /**
137
     * Determines best split point for the given column
138
     *
139
     * @param int $col
140
     *
141
     * @return array
142
     */
143
    protected function getBestNumericalSplit(int $col)
144
    {
145
        $values = array_column($this->samples, $col);
146
        $minValue = min($values);
147
        $maxValue = max($values);
148
        $stepSize = ($maxValue - $minValue) / 10.0;
149
150
        $split = null;
151
152
        foreach (['<=', '>'] as $operator) {
153
            // Before trying all possible split points, let's first try
154
            // the average value for the cut point
155
            $threshold = array_sum($values) / (float) count($values);
156
            $errorRate = $this->calculateErrorRate($threshold, $operator, $values);
157 View Code Duplication
            if ($split == null || $errorRate < $split['trainingErrorRate']) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
158
                $split = ['value' => $threshold, 'operator' => $operator,
159
                        'column' => $col, 'trainingErrorRate' => $errorRate];
160
            }
161
162
            // Try other possible points one by one
163
            for ($step = $minValue; $step <= $maxValue; $step+= $stepSize) {
164
                $threshold = (float)$step;
165
                $errorRate = $this->calculateErrorRate($threshold, $operator, $values);
166 View Code Duplication
                if ($errorRate < $split['trainingErrorRate']) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
167
                    $split = ['value' => $threshold, 'operator' => $operator,
168
                        'column' => $col, 'trainingErrorRate' => $errorRate];
169
                }
170
            }// for
171
        }
172
173
        return $split;
174
    }
175
176
    /**
177
     *
178
     * @param int $col
179
     *
180
     * @return array
181
     */
182
    protected function getBestNominalSplit(int $col)
183
    {
184
        $values = array_column($this->samples, $col);
185
        $valueCounts = array_count_values($values);
186
        $distinctVals= array_keys($valueCounts);
187
188
        $split = null;
189
190
        foreach (['=', '!='] as $operator) {
191
            foreach ($distinctVals as $val) {
192
                $errorRate = $this->calculateErrorRate($val, $operator, $values);
193
194 View Code Duplication
                if ($split == null || $split['trainingErrorRate'] < $errorRate) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
195
                    $split = ['value' => $val, 'operator' => $operator,
196
                        'column' => $col, 'trainingErrorRate' => $errorRate];
197
                }
198
            }// for
199
        }
200
201
        return $split;
202
    }
203
204
205
    /**
206
     *
207
     * @param type $lVal
208
     * @param type $op
209
     * @param type $rVal
210
     *
211
     * @return boolean
212
     */
213
    protected function evaluate($lVal, $op, $rVal)
214
    {
215
        switch ($op) {
216
            case '>': return $lVal > $rVal;
0 ignored issues
show
Coding Style introduced by
The case body in a switch statement must start on the line following the statement.

According to the PSR-2, the body of a case statement must start on the line immediately following the case statement.

switch ($expr) {
case "A":
    doSomething(); //right
    break;
case "B":

    doSomethingElse(); //wrong
    break;

}

To learn more about the PSR-2 coding standard, please refer to the PHP-Fig.

Loading history...
Coding Style introduced by
Terminating statement must be on a line by itself

As per the PSR-2 coding standard, the break (or other terminating) statement must be on a line of its own.

switch ($expr) {
     case "A":
         doSomething();
         break; //wrong
     case "B":
         doSomething();
         break; //right
     case "C:":
         doSomething();
         return true; //right
 }

To learn more about the PSR-2 coding standard, please refer to the PHP-Fig.

Loading history...
217
            case '>=': return $lVal >= $rVal;
0 ignored issues
show
Coding Style introduced by
The case body in a switch statement must start on the line following the statement.

According to the PSR-2, the body of a case statement must start on the line immediately following the case statement.

switch ($expr) {
case "A":
    doSomething(); //right
    break;
case "B":

    doSomethingElse(); //wrong
    break;

}

To learn more about the PSR-2 coding standard, please refer to the PHP-Fig.

Loading history...
Coding Style introduced by
Terminating statement must be on a line by itself

As per the PSR-2 coding standard, the break (or other terminating) statement must be on a line of its own.

switch ($expr) {
     case "A":
         doSomething();
         break; //wrong
     case "B":
         doSomething();
         break; //right
     case "C:":
         doSomething();
         return true; //right
 }

To learn more about the PSR-2 coding standard, please refer to the PHP-Fig.

Loading history...
218
            case '<': return $lVal < $rVal;
0 ignored issues
show
Coding Style introduced by
The case body in a switch statement must start on the line following the statement.

According to the PSR-2, the body of a case statement must start on the line immediately following the case statement.

switch ($expr) {
case "A":
    doSomething(); //right
    break;
case "B":

    doSomethingElse(); //wrong
    break;

}

To learn more about the PSR-2 coding standard, please refer to the PHP-Fig.

Loading history...
Coding Style introduced by
Terminating statement must be on a line by itself

As per the PSR-2 coding standard, the break (or other terminating) statement must be on a line of its own.

switch ($expr) {
     case "A":
         doSomething();
         break; //wrong
     case "B":
         doSomething();
         break; //right
     case "C:":
         doSomething();
         return true; //right
 }

To learn more about the PSR-2 coding standard, please refer to the PHP-Fig.

Loading history...
219
            case '<=': return $lVal <= $rVal;
0 ignored issues
show
Coding Style introduced by
The case body in a switch statement must start on the line following the statement.

According to the PSR-2, the body of a case statement must start on the line immediately following the case statement.

switch ($expr) {
case "A":
    doSomething(); //right
    break;
case "B":

    doSomethingElse(); //wrong
    break;

}

To learn more about the PSR-2 coding standard, please refer to the PHP-Fig.

Loading history...
Coding Style introduced by
Terminating statement must be on a line by itself

As per the PSR-2 coding standard, the break (or other terminating) statement must be on a line of its own.

switch ($expr) {
     case "A":
         doSomething();
         break; //wrong
     case "B":
         doSomething();
         break; //right
     case "C:":
         doSomething();
         return true; //right
 }

To learn more about the PSR-2 coding standard, please refer to the PHP-Fig.

Loading history...
220
            case '=': return $lVal == $rVal;
0 ignored issues
show
Coding Style introduced by
The case body in a switch statement must start on the line following the statement.

According to the PSR-2, the body of a case statement must start on the line immediately following the case statement.

switch ($expr) {
case "A":
    doSomething(); //right
    break;
case "B":

    doSomethingElse(); //wrong
    break;

}

To learn more about the PSR-2 coding standard, please refer to the PHP-Fig.

Loading history...
Coding Style introduced by
Terminating statement must be on a line by itself

As per the PSR-2 coding standard, the break (or other terminating) statement must be on a line of its own.

switch ($expr) {
     case "A":
         doSomething();
         break; //wrong
     case "B":
         doSomething();
         break; //right
     case "C:":
         doSomething();
         return true; //right
 }

To learn more about the PSR-2 coding standard, please refer to the PHP-Fig.

Loading history...
221
            case '!=':
222
            case '<>': return $lVal != $rVal;
0 ignored issues
show
Coding Style introduced by
The case body in a switch statement must start on the line following the statement.

According to the PSR-2, the body of a case statement must start on the line immediately following the case statement.

switch ($expr) {
case "A":
    doSomething(); //right
    break;
case "B":

    doSomethingElse(); //wrong
    break;

}

To learn more about the PSR-2 coding standard, please refer to the PHP-Fig.

Loading history...
Coding Style introduced by
Terminating statement must be on a line by itself

As per the PSR-2 coding standard, the break (or other terminating) statement must be on a line of its own.

switch ($expr) {
     case "A":
         doSomething();
         break; //wrong
     case "B":
         doSomething();
         break; //right
     case "C:":
         doSomething();
         return true; //right
 }

To learn more about the PSR-2 coding standard, please refer to the PHP-Fig.

Loading history...
223
        }
224
225
        return false;
226
    }
227
228
    /**
229
     * Calculates the ratio of wrong predictions based on the new threshold
230
     * value given as the parameter
231
     *
232
     * @param float $threshold
233
     * @param string $operator
234
     * @param array $values
235
     */
236
    protected function calculateErrorRate(float $threshold, string $operator, array $values)
237
    {
238
        $total = (float) array_sum($this->weights);
239
        $wrong = 0.0;
240
        $leftLabel = $this->labels[0];
241
        $rightLabel= $this->labels[1];
242
        foreach ($values as $index => $value) {
243
            if ($this->evaluate($threshold, $operator, $value)) {
0 ignored issues
show
Documentation introduced by
$threshold is of type double, but the function expects a object<Phpml\Classification\Linear\type>.

It seems like the type of the argument is not accepted by the function/method which you are calling.

In some cases, in particular if PHP’s automatic type-juggling kicks in this might be fine. In other cases, however this might be a bug.

We suggest to add an explicit type cast like in the following example:

function acceptsInteger($int) { }

$x = '123'; // string "123"

// Instead of
acceptsInteger($x);

// we recommend to use
acceptsInteger((integer) $x);
Loading history...
Documentation introduced by
$operator is of type string, but the function expects a object<Phpml\Classification\Linear\type>.

It seems like the type of the argument is not accepted by the function/method which you are calling.

In some cases, in particular if PHP’s automatic type-juggling kicks in this might be fine. In other cases, however this might be a bug.

We suggest to add an explicit type cast like in the following example:

function acceptsInteger($int) { }

$x = '123'; // string "123"

// Instead of
acceptsInteger($x);

// we recommend to use
acceptsInteger((integer) $x);
Loading history...
244
                $predicted = $leftLabel;
245
            } else {
246
                $predicted = $rightLabel;
247
            }
248
249
            if ($predicted != $this->targets[$index]) {
250
                $wrong += $this->weights[$index];
251
            }
252
        }
253
254
        return $wrong / $total;
255
    }
256
257
    /**
258
     * @param array $sample
259
     * @return mixed
260
     */
261
    protected function predictSample(array $sample)
262
    {
263
        if ($this->evaluate($this->value, $this->operator, $sample[$this->column])) {
0 ignored issues
show
Documentation introduced by
$this->operator is of type string, but the function expects a object<Phpml\Classification\Linear\type>.

It seems like the type of the argument is not accepted by the function/method which you are calling.

In some cases, in particular if PHP’s automatic type-juggling kicks in this might be fine. In other cases, however this might be a bug.

We suggest to add an explicit type cast like in the following example:

function acceptsInteger($int) { }

$x = '123'; // string "123"

// Instead of
acceptsInteger($x);

// we recommend to use
acceptsInteger((integer) $x);
Loading history...
264
            return $this->labels[0];
265
        }
266
        return $this->labels[1];
267
    }
268
269
    public function __toString()
270
    {
271
        return "$this->column $this->operator $this->value";
272
    }
273
}
274