Passed
Pull Request — master (#82)
by
unknown
02:57
created

LDA::calculateMeans()   C

Complexity

Conditions 7
Paths 21

Size

Total Lines 39
Code Lines 23

Duplication

Lines 0
Ratio 0 %

Importance

Changes 0
Metric Value
dl 0
loc 39
rs 6.7272
c 0
b 0
f 0
cc 7
eloc 23
nc 21
nop 2
1
<?php
2
3
declare(strict_types=1);
4
5
namespace Phpml\DimensionReduction;
6
7
8
use Phpml\Math\Statistic\Mean;
9
use Phpml\Math\Matrix;
10
11
class LDA extends EigenTransformerBase
12
{
13
    /**
14
     * @var bool
15
     */
16
    public $fit = false;
17
18
    /**
19
     * @var array
20
     */
21
    public $labels;
22
23
    /**
24
     * @var array
25
     */
26
    public $means;
27
28
    /**
29
     * @var array
30
     */
31
    public $counts;
32
33
    /**
34
     * @var float
35
     */
36
    public $overallMean;
37
38
    /**
39
     * Linear Discriminant Analysis (LDA) is used to reduce the dimensionality
40
     * of the data. Unlike Principal Component Analysis (PCA), it is a supervised
41
     * technique that requires the class labels in order to fit the data to a
42
     * lower dimensional space. <br><br>
43
     * The algorithm can be initialized by speciyfing
44
     * either with the totalVariance(a value between 0.1 and 0.99)
45
     * or numFeatures (number of features in the dataset) to be preserved.
46
     *
47
     * @param float|null $totalVariance Total explained variance to be preserved
48
     * @param int|null $numFeatures Number of features to be preserved
49
     *
50
     * @throws \Exception
51
     */
52 View Code Duplication
    public function __construct($totalVariance = null, $numFeatures = null)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
53
    {
54
        if ($totalVariance !== null && ($totalVariance < 0.1 || $totalVariance > 0.99)) {
55
            throw new \Exception("Total variance can be a value between 0.1 and 0.99");
56
        }
57
        if ($numFeatures !== null && $numFeatures <= 0) {
58
            throw new \Exception("Number of features to be preserved should be greater than 0");
59
        }
60
        if ($totalVariance !== null && $numFeatures !== null) {
61
            throw new \Exception("Either totalVariance or numFeatures should be specified in order to run the algorithm");
62
        }
63
64
        if ($numFeatures !== null) {
65
            $this->numFeatures = $numFeatures;
66
        }
67
        if ($totalVariance !== null) {
68
            $this->totalVariance = $totalVariance;
69
        }
70
    }
71
72
    /**
73
     * Trains the algorithm to transform the given data to a lower dimensional space.
74
     *
75
     * @param array $data
76
     * @param array $classes
77
     *
78
     * @return array
79
     */
80
    public function fit(array $data, array $classes) : array
81
    {
82
        $this->labels = $this->getLabels($classes);
83
        $this->means  = $this->calculateMeans($data, $classes);
0 ignored issues
show
Documentation introduced by
$data is of type array, but the function expects a object<Phpml\DimensionReduction\type>.

It seems like the type of the argument is not accepted by the function/method which you are calling.

In some cases, in particular if PHP’s automatic type-juggling kicks in this might be fine. In other cases, however this might be a bug.

We suggest to add an explicit type cast like in the following example:

function acceptsInteger($int) { }

$x = '123'; // string "123"

// Instead of
acceptsInteger($x);

// we recommend to use
acceptsInteger((integer) $x);
Loading history...
Documentation introduced by
$classes is of type array, but the function expects a object<Phpml\DimensionReduction\type>.

It seems like the type of the argument is not accepted by the function/method which you are calling.

In some cases, in particular if PHP’s automatic type-juggling kicks in this might be fine. In other cases, however this might be a bug.

We suggest to add an explicit type cast like in the following example:

function acceptsInteger($int) { }

$x = '123'; // string "123"

// Instead of
acceptsInteger($x);

// we recommend to use
acceptsInteger((integer) $x);
Loading history...
84
85
        $sW = $this->calculateClassVar($data, $classes);
86
        $sB = $this->calculateClassCov();
87
88
        $S = $sW->inverse()->multiply($sB);
89
        $this->eigenDecomposition($S->toArray());
90
91
        return $this->reduce($data);
92
    }
93
94
    /**
95
     * Returns unique labels in the dataset
96
     *
97
     * @param array $classes
98
     *
99
     * @return array
100
     */
101
    protected function getLabels(array $classes): array
102
    {
103
        $counts = array_count_values($classes);
104
105
        return array_keys($counts);
106
    }
107
108
109
    /**
110
     * Calculates mean of each column for each class and returns
111
     * n by m matrix where n is number of labels and m is number of columns
112
     *
113
     * @param type $data
114
     * @param type $classes
115
     *
116
     * @return array
117
     */
118
    protected function calculateMeans($data, $classes) : array
119
    {
120
        $means = [];
121
        $counts= [];
122
        $overallMean = array_fill(0, count($data[0]), 0.0);
123
124
        foreach ($data as $index => $row) {
125
            $label = array_search($classes[$index], $this->labels);
126
127
            foreach ($row as $col => $val) {
128
                if (! isset($means[$label][$col])) {
129
                    $means[$label][$col] = 0.0;
130
                }
131
                $means[$label][$col] += $val;
132
                $overallMean[$col] += $val;
133
            }
134
135
            if (! isset($counts[$label])) {
136
                $counts[$label] = 0;
137
            }
138
            $counts[$label]++;
139
        }
140
141
        foreach ($means as $index => $row) {
142
            foreach ($row as $col => $sum) {
143
                $means[$index][$col] = $sum / $counts[$index];
144
            }
145
        }
146
147
        // Calculate overall mean of the dataset for each column
148
        $numElements = array_sum($counts);
149
        $map = function ($el) use ($numElements) {
150
            return $el / $numElements;
151
        };
152
        $this->overallMean = array_map($map, $overallMean);
0 ignored issues
show
Documentation Bug introduced by
It seems like array_map($map, $overallMean) of type array is incompatible with the declared type double of property $overallMean.

Our type inference engine has found an assignment to a property that is incompatible with the declared type of that property.

Either this assignment is in error or the assigned type should be added to the documentation/type hint for that property..

Loading history...
153
        $this->counts = $counts;
154
155
        return $means;
156
    }
157
158
159
    /**
160
     * Returns in-class scatter matrix for each class, which
161
     * is a n by m matrix where n is number of classes and
162
     * m is number of columns
163
     *
164
     * @param array $data
165
     * @param array $classes
166
     *
167
     * @return Matrix
168
     */
169
    protected function calculateClassVar($data, $classes)
170
    {
171
        // s is an n (number of classes) by m (number of column) matrix
172
        $s = array_fill(0, count($data[0]), array_fill(0, count($data[0]), 0));
173
        $sW = new Matrix($s, false);
174
175
        foreach ($data as $index => $row) {
176
            $label = array_search($classes[$index], $this->labels);
177
            $means = $this->means[$label];
178
179
            $row = $this->calculateVar($row, $means);
180
181
            $sW = $sW->add($row);
182
        }
183
184
        return $sW;
185
    }
186
187
    /**
188
     * Returns between-class scatter matrix for each class, which
189
     * is an n by m matrix where n is number of classes and
190
     * m is number of columns
191
     *
192
     * @return Matrix
193
     */
194
    protected function calculateClassCov()
195
    {
196
        // s is an n (number of classes) by m (number of column) matrix
197
        $s = array_fill(0, count($this->overallMean), array_fill(0, count($this->overallMean), 0));
198
        $sB = new Matrix($s, false);
199
200
        foreach ($this->means as $index => $classMeans) {
201
            $row = $this->calculateVar($classMeans, $this->overallMean);
0 ignored issues
show
Documentation introduced by
$this->overallMean is of type double, but the function expects a array.

It seems like the type of the argument is not accepted by the function/method which you are calling.

In some cases, in particular if PHP’s automatic type-juggling kicks in this might be fine. In other cases, however this might be a bug.

We suggest to add an explicit type cast like in the following example:

function acceptsInteger($int) { }

$x = '123'; // string "123"

// Instead of
acceptsInteger($x);

// we recommend to use
acceptsInteger((integer) $x);
Loading history...
202
            $N = $this->counts[$index];
203
            $sB = $sB->add($row->multiplyByScalar($N));
204
        }
205
206
        return $sB;
207
    }
208
209
    /**
210
     * Returns the result of the calculation (x - m)T.(x - m)
211
     *
212
     * @param array $row
213
     * @param array $means
214
     *
215
     * @return Matrix
216
     */
217
    protected function calculateVar(array $row, array $means)
218
    {
219
        $x = new Matrix($row, false);
220
        $m = new Matrix($means, false);
221
        $diff = $x->subtract($m);
222
223
        return $diff->transpose()->multiply($diff);
224
    }
225
}
226