Passed
Push — master ( 12b8b1...5b373f )
by Arkadiusz
04:53
created

src/Phpml/DimensionReduction/PCA.php (1 issue)

Upgrade to new PHP Analysis Engine

These results are based on our legacy PHP analysis, consider migrating to our new PHP analysis engine instead. Learn more

1
<?php
2
3
declare(strict_types=1);
4
5
namespace Phpml\DimensionReduction;
6
7
use Phpml\Math\Statistic\Covariance;
8
use Phpml\Math\Statistic\Mean;
9
use Phpml\Math\Matrix;
10
11
class PCA extends EigenTransformerBase
12
{
13
    /**
14
     * Temporary storage for mean values for each dimension in given data
15
     *
16
     * @var array
17
     */
18
    protected $means = [];
19
20
    /**
21
     * @var bool
22
     */
23
    protected $fit = false;
24
25
    /**
26
     * PCA (Principal Component Analysis) used to explain given
27
     * data with lower number of dimensions. This analysis transforms the
28
     * data to a lower dimensional version of it by conserving a proportion of total variance
29
     * within the data. It is a lossy data compression technique.<br>
30
     *
31
     * @param float $totalVariance Total explained variance to be preserved
32
     * @param int $numFeatures Number of features to be preserved
33
     *
34
     * @throws \Exception
35
     */
36 View Code Duplication
    public function __construct($totalVariance = null, $numFeatures = null)
0 ignored issues
show
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
37
    {
38
        if ($totalVariance !== null && ($totalVariance < 0.1 || $totalVariance > 0.99)) {
39
            throw new \Exception("Total variance can be a value between 0.1 and 0.99");
40
        }
41
        if ($numFeatures !== null && $numFeatures <= 0) {
42
            throw new \Exception("Number of features to be preserved should be greater than 0");
43
        }
44
        if ($totalVariance !== null && $numFeatures !== null) {
45
            throw new \Exception("Either totalVariance or numFeatures should be specified in order to run the algorithm");
46
        }
47
48
        if ($numFeatures !== null) {
49
            $this->numFeatures = $numFeatures;
50
        }
51
        if ($totalVariance !== null) {
52
            $this->totalVariance = $totalVariance;
53
        }
54
    }
55
56
    /**
57
     * Takes a data and returns a lower dimensional version
58
     * of this data while preserving $totalVariance or $numFeatures. <br>
59
     * $data is an n-by-m matrix and returned array is
60
     * n-by-k matrix where k <= m
61
     *
62
     * @param array $data
63
     *
64
     * @return array
65
     */
66
    public function fit(array $data)
67
    {
68
        $n = count($data[0]);
69
70
        $data = $this->normalize($data, $n);
71
72
        $covMatrix = Covariance::covarianceMatrix($data, array_fill(0, $n, 0));
73
74
        $this->eigenDecomposition($covMatrix);
75
76
        $this->fit = true;
77
78
        return $this->reduce($data);
79
    }
80
81
    /**
82
     * @param array $data
83
     * @param int $n
84
     */
85
    protected function calculateMeans(array $data, int $n)
86
    {
87
        // Calculate means for each dimension
88
        $this->means = [];
89 View Code Duplication
        for ($i=0; $i < $n; $i++) {
90
            $column = array_column($data, $i);
91
            $this->means[] = Mean::arithmetic($column);
92
        }
93
    }
94
95
    /**
96
     * Normalization of the data includes subtracting mean from
97
     * each dimension therefore dimensions will be centered to zero
98
     *
99
     * @param array $data
100
     * @param int $n
101
     *
102
     * @return array
103
     */
104
    protected function normalize(array $data, int $n)
105
    {
106
        if (empty($this->means)) {
107
            $this->calculateMeans($data, $n);
108
        }
109
110
        // Normalize data
111
        foreach ($data as $i => $row) {
112
            for ($k=0; $k < $n; $k++) {
113
                $data[$i][$k] -= $this->means[$k];
114
            }
115
        }
116
117
        return $data;
118
    }
119
120
    /**
121
     * Transforms the given sample to a lower dimensional vector by using
122
     * the eigenVectors obtained in the last run of <code>fit</code>.
123
     *
124
     * @param array $sample
125
     *
126
     * @return array
127
     */
128
    public function transform(array $sample)
129
    {
130
        if (!$this->fit) {
131
            throw new \Exception("PCA has not been fitted with respect to original dataset, please run PCA::fit() first");
132
        }
133
134
        if (! is_array($sample[0])) {
135
            $sample = [$sample];
136
        }
137
138
        $sample = $this->normalize($sample, count($sample[0]));
139
140
        return $this->reduce($sample);
141
    }
142
}
143