Passed
Push — master ( ed4035...f2dd38 )
by MusikAnimal
10:32
created

Blame::getAsOf()   A

Complexity

Conditions 3
Paths 3

Size

Total Lines 11
Code Lines 6

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 0
CRAP Score 12

Importance

Changes 0
Metric Value
cc 3
eloc 6
nc 3
nop 0
dl 0
loc 11
ccs 0
cts 7
cp 0
crap 12
rs 10
c 0
b 0
f 0
1
<?php
2
declare(strict_types = 1);
3
4
namespace AppBundle\Model;
5
6
/**
7
 * A Blame will search the given page for the given text and return the relevant revisions and authors.
8
 */
9
class Blame extends Authorship
10
{
11
    /** @var string Text to search for. */
12
    protected $query;
13
14
    /** @var array|null Matches, keyed by revision ID, each with keys 'edit' <Edit> and 'tokens' <string[]>. */
15
    protected $matches;
16
17
    /** @var Edit|null Target revision that is being blamed. */
18
    protected $asOf;
19
20
    /**
21
     * Blame constructor.
22
     * @param Page $page The page to process.
23
     * @param string $query Text to search for.
24
     * @param string|null $target Either a revision ID or date in YYYY-MM-DD format. Null to use latest revision.
25
     */
26
    public function __construct(Page $page, string $query, ?string $target = null)
27
    {
28
        parent::__construct($page, $target);
29
30
        $this->query = $query;
31
    }
32
33
    /**
34
     * Get the search query.
35
     * @return string
36
     */
37 1
    public function getQuery(): string
38
    {
39 1
        return $this->query;
40
    }
41
42
    /**
43
     * Matches, keyed by revision ID, each with keys 'edit' <Edit> and 'tokens' <string[]>.
44
     * @return array|null
45
     */
46
    public function getMatches(): ?array
47
    {
48
        return $this->matches;
49
    }
50
51
    /**
52
     * Get all the matches as Edits.
53
     * @return Edit[]|null
54
     */
55
    public function getEdits(): ?array
56
    {
57
        return array_column($this->matches, 'edit');
0 ignored issues
show
Bug introduced by
It seems like $this->matches can also be of type null; however, parameter $array of array_column() does only seem to accept array, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

57
        return array_column(/** @scrutinizer ignore-type */ $this->matches, 'edit');
Loading history...
58
    }
59
60
    /**
61
     * Strip out spaces, since they are not accounted for in the WikiWho API.
62
     * @return string
63
     */
64 1
    public function getTokenizedQuery(): string
65
    {
66 1
        return strtolower(preg_replace('/\s*/m', '', $this->query));
67
    }
68
69
    /**
70
     * Get the first "token" of the search query. A "token" in this case is a word or group of syntax,
71
     * roughly correlating to the token structure returned by the WikiWho API.
72
     * @return string
73
     */
74
    public function getFirstQueryToken(): string
75
    {
76
        return strtolower(preg_split('/[\n\s]/', $this->query)[0]);
77
    }
78
79
    /**
80
     * Get the target revision that is being blamed.
81
     * @return Edit|null
82
     */
83
    public function getAsOf(): ?Edit
84
    {
85
        if (isset($this->asOf)) {
86
            return $this->asOf;
87
        }
88
89
        $this->asOf = $this->target
90
            ? $this->getRepository()->getEditFromRevId($this->page, $this->target)
0 ignored issues
show
Bug introduced by
The method getEditFromRevId() does not exist on AppBundle\Repository\Repository. It seems like you code against a sub-type of AppBundle\Repository\Repository such as AppBundle\Repository\BlameRepository. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

90
            ? $this->getRepository()->/** @scrutinizer ignore-call */ getEditFromRevId($this->page, $this->target)
Loading history...
91
            : null;
92
93
        return $this->asOf;
94
    }
95
96
    /**
97
     * Get authorship attribution from the WikiWho API.
98
     * @see https://www.f-squared.org/wikiwho/
99
     */
100 1
    public function prepareData(): void
101
    {
102 1
        if (isset($this->matches)) {
103
            return;
104
        }
105
106
        // Set revision data. self::setRevisionData() returns null if there are errors.
107 1
        $revisionData = $this->getRevisionData(true);
108 1
        if (null === $revisionData) {
109
            return;
110
        }
111
112 1
        $matches = $this->searchTokens($revisionData['tokens']);
113
114
        // We want the results grouped by editor and revision ID.
115 1
        $this->matches = [];
116 1
        foreach ($matches as $match) {
117 1
            if (isset($this->matches[$match['id']])) {
118 1
                $this->matches[$match['id']]['tokens'][] = $match['token'];
119 1
                continue;
120
            }
121
122 1
            $edit = $this->getRepository()->getEditFromRevId($this->page, $match['id']);
123 1
            $this->matches[$match['id']] = [
124 1
                'edit' => $edit,
125 1
                'tokens' => [$match['token']],
126
            ];
127
        }
128 1
    }
129
130
    /**
131
     * Find matches of search query in the given list of tokens.
132
     * @param array $tokens
133
     * @return array
134
     */
135 1
    private function searchTokens(array $tokens): array
136
    {
137 1
        $matchData = [];
138 1
        $matchDataSoFar = [];
139 1
        $matchSoFar = '';
140 1
        $firstQueryToken = $this->getFirstQueryToken();
141 1
        $tokenizedQuery = $this->getTokenizedQuery();
142
143 1
        foreach ($tokens as $token) {
144
            // The previous matches plus the new token. This is basically a candidate for what may become $matchSoFar.
145 1
            $newMatchSoFar = $matchSoFar.$token['str'];
146
147
            // We first check if the first token of the query matches, because we want to allow for partial matches
148
            // (e.g. for query "barbaz", the tokens ["foobar","baz"] should match).
149 1
            if (false !== strpos($newMatchSoFar, $firstQueryToken)) {
150
                // If the full query is in the new match, use it, otherwise use just the first token. This is because
151
                // the full match may exist across multiple tokens, but the first match is only a partial match.
152 1
                $newMatchSoFar = false !== strpos($newMatchSoFar, $tokenizedQuery)
153 1
                    ? $newMatchSoFar
154 1
                    : $firstQueryToken;
155
            }
156
157
            // Keep track of tokens that match. To allow partial matches,
158
            // we check the query against $newMatchSoFar and vice versa.
159 1
            if (false !== strpos($tokenizedQuery, $newMatchSoFar) ||
160 1
                false !== strpos($newMatchSoFar, $tokenizedQuery)
161
            ) {
162 1
                $matchSoFar = $newMatchSoFar;
163 1
                $matchDataSoFar[] = [
164 1
                    'id' => $token['o_rev_id'],
165 1
                    'editor' => $token['editor'],
166 1
                    'token' => $token['str'],
167
                ];
168 1
            } elseif (!empty($matchSoFar)) {
169
                // We hit a token that isn't in the query string, so start over.
170
                $matchDataSoFar = [];
171
                $matchSoFar = '';
172
            }
173
174
            // A full match was found, so merge $matchDataSoFar into $matchData,
175
            // and start over to see if there are more matches in the article.
176 1
            if (false !== strpos($matchSoFar, $tokenizedQuery)) {
177 1
                $matchData = array_merge($matchData, $matchDataSoFar);
178 1
                $matchDataSoFar = [];
179 1
                $matchSoFar = '';
180
            }
181
        }
182
183
        // Full matches usually come last, but are the most relevant.
184 1
        return array_reverse($matchData);
185
    }
186
}
187