Passed
Push — master ( 12b8b1...5b373f )
by Arkadiusz
04:53
created

LDA::fit()   A

Complexity

Conditions 1
Paths 1

Size

Total Lines 15
Code Lines 9

Duplication

Lines 0
Ratio 0 %

Importance

Changes 0
Metric Value
dl 0
loc 15
rs 9.4285
c 0
b 0
f 0
cc 1
eloc 9
nc 1
nop 2
1
<?php
2
3
declare(strict_types=1);
4
5
namespace Phpml\DimensionReduction;
6
7
use Phpml\Math\Statistic\Mean;
8
use Phpml\Math\Matrix;
9
10
class LDA extends EigenTransformerBase
11
{
12
    /**
13
     * @var bool
14
     */
15
    public $fit = false;
16
17
    /**
18
     * @var array
19
     */
20
    public $labels;
21
22
    /**
23
     * @var array
24
     */
25
    public $means;
26
27
    /**
28
     * @var array
29
     */
30
    public $counts;
31
32
    /**
33
     * @var float
34
     */
35
    public $overallMean;
36
37
    /**
38
     * Linear Discriminant Analysis (LDA) is used to reduce the dimensionality
39
     * of the data. Unlike Principal Component Analysis (PCA), it is a supervised
40
     * technique that requires the class labels in order to fit the data to a
41
     * lower dimensional space. <br><br>
42
     * The algorithm can be initialized by speciyfing
43
     * either with the totalVariance(a value between 0.1 and 0.99)
44
     * or numFeatures (number of features in the dataset) to be preserved.
45
     *
46
     * @param float|null $totalVariance Total explained variance to be preserved
47
     * @param int|null $numFeatures Number of features to be preserved
48
     *
49
     * @throws \Exception
50
     */
51 View Code Duplication
    public function __construct($totalVariance = null, $numFeatures = null)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
52
    {
53
        if ($totalVariance !== null && ($totalVariance < 0.1 || $totalVariance > 0.99)) {
54
            throw new \Exception("Total variance can be a value between 0.1 and 0.99");
55
        }
56
        if ($numFeatures !== null && $numFeatures <= 0) {
57
            throw new \Exception("Number of features to be preserved should be greater than 0");
58
        }
59
        if ($totalVariance !== null && $numFeatures !== null) {
60
            throw new \Exception("Either totalVariance or numFeatures should be specified in order to run the algorithm");
61
        }
62
63
        if ($numFeatures !== null) {
64
            $this->numFeatures = $numFeatures;
65
        }
66
        if ($totalVariance !== null) {
67
            $this->totalVariance = $totalVariance;
68
        }
69
    }
70
71
    /**
72
     * Trains the algorithm to transform the given data to a lower dimensional space.
73
     *
74
     * @param array $data
75
     * @param array $classes
76
     *
77
     * @return array
78
     */
79
    public function fit(array $data, array $classes) : array
80
    {
81
        $this->labels = $this->getLabels($classes);
82
        $this->means  = $this->calculateMeans($data, $classes);
0 ignored issues
show
Documentation introduced by
$data is of type array, but the function expects a object<Phpml\DimensionReduction\type>.

It seems like the type of the argument is not accepted by the function/method which you are calling.

In some cases, in particular if PHP’s automatic type-juggling kicks in this might be fine. In other cases, however this might be a bug.

We suggest to add an explicit type cast like in the following example:

function acceptsInteger($int) { }

$x = '123'; // string "123"

// Instead of
acceptsInteger($x);

// we recommend to use
acceptsInteger((integer) $x);
Loading history...
Documentation introduced by
$classes is of type array, but the function expects a object<Phpml\DimensionReduction\type>.

It seems like the type of the argument is not accepted by the function/method which you are calling.

In some cases, in particular if PHP’s automatic type-juggling kicks in this might be fine. In other cases, however this might be a bug.

We suggest to add an explicit type cast like in the following example:

function acceptsInteger($int) { }

$x = '123'; // string "123"

// Instead of
acceptsInteger($x);

// we recommend to use
acceptsInteger((integer) $x);
Loading history...
83
84
        $sW = $this->calculateClassVar($data, $classes);
85
        $sB = $this->calculateClassCov();
86
87
        $S = $sW->inverse()->multiply($sB);
88
        $this->eigenDecomposition($S->toArray());
89
90
        $this->fit = true;
91
92
        return $this->reduce($data);
93
    }
94
95
    /**
96
     * Returns unique labels in the dataset
97
     *
98
     * @param array $classes
99
     *
100
     * @return array
101
     */
102
    protected function getLabels(array $classes): array
103
    {
104
        $counts = array_count_values($classes);
105
106
        return array_keys($counts);
107
    }
108
109
110
    /**
111
     * Calculates mean of each column for each class and returns
112
     * n by m matrix where n is number of labels and m is number of columns
113
     *
114
     * @param type $data
115
     * @param type $classes
116
     *
117
     * @return array
118
     */
119
    protected function calculateMeans($data, $classes) : array
120
    {
121
        $means = [];
122
        $counts= [];
123
        $overallMean = array_fill(0, count($data[0]), 0.0);
124
125
        foreach ($data as $index => $row) {
126
            $label = array_search($classes[$index], $this->labels);
127
128
            foreach ($row as $col => $val) {
129
                if (! isset($means[$label][$col])) {
130
                    $means[$label][$col] = 0.0;
131
                }
132
                $means[$label][$col] += $val;
133
                $overallMean[$col] += $val;
134
            }
135
136
            if (! isset($counts[$label])) {
137
                $counts[$label] = 0;
138
            }
139
            $counts[$label]++;
140
        }
141
142
        foreach ($means as $index => $row) {
143
            foreach ($row as $col => $sum) {
144
                $means[$index][$col] = $sum / $counts[$index];
145
            }
146
        }
147
148
        // Calculate overall mean of the dataset for each column
149
        $numElements = array_sum($counts);
150
        $map = function ($el) use ($numElements) {
151
            return $el / $numElements;
152
        };
153
        $this->overallMean = array_map($map, $overallMean);
0 ignored issues
show
Documentation Bug introduced by
It seems like array_map($map, $overallMean) of type array is incompatible with the declared type double of property $overallMean.

Our type inference engine has found an assignment to a property that is incompatible with the declared type of that property.

Either this assignment is in error or the assigned type should be added to the documentation/type hint for that property..

Loading history...
154
        $this->counts = $counts;
155
156
        return $means;
157
    }
158
159
160
    /**
161
     * Returns in-class scatter matrix for each class, which
162
     * is a n by m matrix where n is number of classes and
163
     * m is number of columns
164
     *
165
     * @param array $data
166
     * @param array $classes
167
     *
168
     * @return Matrix
169
     */
170
    protected function calculateClassVar($data, $classes)
171
    {
172
        // s is an n (number of classes) by m (number of column) matrix
173
        $s = array_fill(0, count($data[0]), array_fill(0, count($data[0]), 0));
174
        $sW = new Matrix($s, false);
175
176
        foreach ($data as $index => $row) {
177
            $label = array_search($classes[$index], $this->labels);
178
            $means = $this->means[$label];
179
180
            $row = $this->calculateVar($row, $means);
181
182
            $sW = $sW->add($row);
183
        }
184
185
        return $sW;
186
    }
187
188
    /**
189
     * Returns between-class scatter matrix for each class, which
190
     * is an n by m matrix where n is number of classes and
191
     * m is number of columns
192
     *
193
     * @return Matrix
194
     */
195
    protected function calculateClassCov()
196
    {
197
        // s is an n (number of classes) by m (number of column) matrix
198
        $s = array_fill(0, count($this->overallMean), array_fill(0, count($this->overallMean), 0));
199
        $sB = new Matrix($s, false);
200
201
        foreach ($this->means as $index => $classMeans) {
202
            $row = $this->calculateVar($classMeans, $this->overallMean);
0 ignored issues
show
Documentation introduced by
$this->overallMean is of type double, but the function expects a array.

It seems like the type of the argument is not accepted by the function/method which you are calling.

In some cases, in particular if PHP’s automatic type-juggling kicks in this might be fine. In other cases, however this might be a bug.

We suggest to add an explicit type cast like in the following example:

function acceptsInteger($int) { }

$x = '123'; // string "123"

// Instead of
acceptsInteger($x);

// we recommend to use
acceptsInteger((integer) $x);
Loading history...
203
            $N = $this->counts[$index];
204
            $sB = $sB->add($row->multiplyByScalar($N));
205
        }
206
207
        return $sB;
208
    }
209
210
    /**
211
     * Returns the result of the calculation (x - m)T.(x - m)
212
     *
213
     * @param array $row
214
     * @param array $means
215
     *
216
     * @return Matrix
217
     */
218
    protected function calculateVar(array $row, array $means)
219
    {
220
        $x = new Matrix($row, false);
221
        $m = new Matrix($means, false);
222
        $diff = $x->subtract($m);
223
224
        return $diff->transpose()->multiply($diff);
225
    }
226
227
    /**
228
     * Transforms the given sample to a lower dimensional vector by using
229
     * the eigenVectors obtained in the last run of <code>fit</code>.
230
     *
231
     * @param array $sample
232
     *
233
     * @return array
234
     */
235 View Code Duplication
    public function transform(array $sample)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
236
    {
237
        if (!$this->fit) {
238
            throw new \Exception("LDA has not been fitted with respect to original dataset, please run LDA::fit() first");
239
        }
240
241
        if (! is_array($sample[0])) {
242
            $sample = [$sample];
243
        }
244
245
        return $this->reduce($sample);
246
    }
247
}
248