Test Setup Failed
Pull Request — main (#426)
by MusikAnimal
17:10 queued 11:44
created

GlobalContribsRepository::getRevisions()   C

Complexity

Conditions 12
Paths 67

Size

Total Lines 123
Code Lines 62

Duplication

Lines 0
Ratio 0 %

Importance

Changes 0
Metric Value
cc 12
eloc 62
nc 67
nop 7
dl 0
loc 123
rs 6.4024
c 0
b 0
f 0

How to fix   Long Method    Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
<?php
2
declare(strict_types = 1);
3
4
namespace App\Repository;
5
6
use App\Model\Project;
7
use App\Model\User;
8
use PDO;
9
use Symfony\Component\DependencyInjection\ContainerInterface;
10
use Wikimedia\IPUtils;
11
12
/**
13
 * A GlobalContribsRepository is responsible for retrieving information from the database for the GlobalContribs tool.
14
 * @codeCoverageIgnore
15
 */
16
class GlobalContribsRepository extends Repository
17
{
18
    /** @var Project CentralAuth project (meta.wikimedia for WMF installation). */
19
    protected $caProject;
20
21
    /**
22
     * Create Project and ProjectRepository once we have the container.
23
     * @param ContainerInterface $container
24
     */
25
    public function setContainer(ContainerInterface $container): void
26
    {
27
        parent::setContainer($container);
28
29
        $this->caProject = ProjectRepository::getProject(
30
            $this->container->getParameter('central_auth_project'),
31
            $this->container
32
        );
33
        $this->caProject->getRepository()
34
            ->setContainer($this->container);
35
    }
36
37
    /**
38
     * Get a user's edit count for each project.
39
     * @see GlobalContribsRepository::globalEditCountsFromCentralAuth()
40
     * @see GlobalContribsRepository::globalEditCountsFromDatabases()
41
     * @param User $user The user.
42
     * @return mixed[] Elements are arrays with 'project' (Project), and 'total' (int). Null if anon (too slow).
43
     */
44
    public function globalEditCounts(User $user): ?array
45
    {
46
        if ($user->isAnon()) {
47
            return null;
48
        }
49
50
        // Get the edit counts from CentralAuth or database.
51
        $editCounts = $this->globalEditCountsFromCentralAuth($user);
52
53
        // Pre-populate all projects' metadata, to prevent each project call from fetching it.
54
        $this->caProject->getRepository()->getAll();
0 ignored issues
show
Bug introduced by
The method getAll() does not exist on App\Repository\Repository. It seems like you code against a sub-type of App\Repository\Repository such as App\Repository\ProjectRepository. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

54
        $this->caProject->getRepository()->/** @scrutinizer ignore-call */ getAll();
Loading history...
55
56
        // Compile the output.
57
        $out = [];
58
        foreach ($editCounts as $editCount) {
59
            $out[] = [
60
                'dbName' => $editCount['dbName'],
61
                'total' => $editCount['total'],
62
                'project' => ProjectRepository::getProject($editCount['dbName'], $this->container),
63
            ];
64
        }
65
        return $out;
66
    }
67
68
    /**
69
     * Get a user's total edit count on one or more project.
70
     * Requires the CentralAuth extension to be installed on the project.
71
     * @param User $user The user.
72
     * @return mixed[]|false Elements are arrays with 'dbName' (string), and 'total' (int). False for logged out users.
73
     */
74
    protected function globalEditCountsFromCentralAuth(User $user)
75
    {
76
        if (true === $user->isAnon()) {
77
            return false;
78
        }
79
80
        // Set up cache.
81
        $cacheKey = $this->getCacheKey(func_get_args(), 'gc_globaleditcounts');
82
        if ($this->cache->hasItem($cacheKey)) {
83
            return $this->cache->getItem($cacheKey)->get();
84
        }
85
86
        $params = [
87
            'meta' => 'globaluserinfo',
88
            'guiprop' => 'editcount|merged',
89
            'guiuser' => $user->getUsername(),
90
        ];
91
        $result = $this->executeApiRequest($this->caProject, $params);
92
        if (!isset($result['query']['globaluserinfo']['merged'])) {
93
            return [];
94
        }
95
        $out = [];
96
        foreach ($result['query']['globaluserinfo']['merged'] as $result) {
97
            $out[] = [
98
                'dbName' => $result['wiki'],
99
                'total' => $result['editcount'],
100
            ];
101
        }
102
103
        // Cache and return.
104
        return $this->setCache($cacheKey, $out);
105
    }
106
107
    /**
108
     * Loop through the given dbNames and create Project objects for each.
109
     * @param array $dbNames
110
     * @return Project[] Keyed by database name.
111
     */
112
    private function formatProjects(array $dbNames): array
113
    {
114
        $projects = [];
115
116
        foreach ($dbNames as $dbName) {
117
            $projects[$dbName] = ProjectRepository::getProject($dbName, $this->container);
118
        }
119
120
        return $projects;
121
    }
122
123
    /**
124
     * Get all Projects on which the user has made at least one edit.
125
     * @param User $user
126
     * @return Project[]
127
     */
128
    public function getProjectsWithEdits(User $user): array
129
    {
130
        if ($user->isAnon()) {
131
            $dbNames = array_keys($this->getDbNamesAndActorIds($user));
132
        } else {
133
            $dbNames = [];
134
135
            foreach ($this->globalEditCountsFromCentralAuth($user) as $projectMeta) {
136
                if ($projectMeta['total'] > 0) {
137
                    $dbNames[] = $projectMeta['dbName'];
138
                }
139
            }
140
        }
141
142
        return $this->formatProjects($dbNames);
143
    }
144
145
    /**
146
     * Get projects that the user has made at least one edit on, and the associated actor ID.
147
     * @param User $user
148
     * @param string[] $dbNames Loop over these projects instead of all of them.
149
     * @return mixed[] Keys are database names, values are actor IDs.
150
     */
151
    public function getDbNamesAndActorIds(User $user, ?array $dbNames = null): array
152
    {
153
        // Check cache.
154
        $cacheKey = $this->getCacheKey(func_get_args(), 'gc_db_names_actor_ids');
155
        if ($this->cache->hasItem($cacheKey)) {
156
            return $this->cache->getItem($cacheKey)->get();
157
        }
158
159
        if (!$dbNames) {
160
            $dbNames = array_column($this->caProject->getRepository()->getAll(), 'dbName');
161
        }
162
163
        if ($user->isIpRange()) {
164
            $username = $user->getIpSubstringFromCidr().'%';
165
            $whereClause = "actor_name LIKE :actor";
166
        } else {
167
            $username = $user->getUsername();
168
            $whereClause = "actor_name = :actor";
169
        }
170
171
        $queriesBySlice = [];
172
173
        foreach ($dbNames as $dbName) {
174
            $slice = $this->getDbList()[$dbName];
175
            // actor_revision table only includes users who have made at least one edit.
176
            $actorTable = $this->getTableName($dbName, 'actor', 'revision');
177
            $queriesBySlice[$slice][] = "SELECT '$dbName' AS `dbName`, actor_id " .
178
                "FROM $actorTable WHERE $whereClause";
179
        }
180
181
        $actorIds = [];
182
183
        foreach ($queriesBySlice as $slice => $queries) {
184
            $sql = implode(' UNION ', $queries);
185
            $resultQuery = $this->executeProjectsQuery($slice, $sql, [
186
                'actor' => $username,
187
            ]);
188
189
            while ($row = $resultQuery->fetchAssociative()) {
190
                $actorIds[$row['dbName']] = (int)$row['actor_id'];
191
            }
192
        }
193
194
        return $this->setCache($cacheKey, $actorIds);
195
    }
196
197
    /**
198
     * Get revisions by this user across the given Projects.
199
     * @param string[] $dbNames Database names of projects to iterate over.
200
     * @param User $user The user.
201
     * @param int|string $namespace Namespace ID or 'all' for all namespaces.
202
     * @param int|false $start Unix timestamp or false.
203
     * @param int|false $end Unix timestamp or false.
204
     * @param int $limit The maximum number of revisions to fetch from each project.
205
     * @param int|false $offset Unix timestamp. Used for pagination.
206
     * @return array
207
     */
208
    public function getRevisions(
209
        array $dbNames,
210
        User $user,
211
        $namespace = 'all',
212
        $start = false,
213
        $end = false,
214
        int $limit = 31, // One extra to know whether there should be another page.
215
        $offset = false
216
    ): array {
217
        // Check cache.
218
        $cacheKey = $this->getCacheKey(func_get_args(), 'gc_revisions');
219
        if ($this->cache->hasItem($cacheKey)) {
220
            return $this->cache->getItem($cacheKey)->get();
221
        }
222
223
        // Just need any Connection to use the ->quote() method.
224
        $quoteConn = $this->getProjectsConnection('s1');
225
        $username = $quoteConn->quote($user->getUsername(), PDO::PARAM_STR);
226
227
        // IP range handling.
228
        $startIp = '';
229
        $endIp = '';
230
        if ($user->isIpRange()) {
231
            [$startIp, $endIp] = IPUtils::parseRange($user->getUsername());
232
            $startIp = $quoteConn->quote($startIp, PDO::PARAM_STR);
233
            $endIp = $quoteConn->quote($endIp, PDO::PARAM_STR);
234
        }
235
236
        // Fetch actor IDs (for IP ranges, it strips trailing zeros and uses a LIKE query).
237
        $actorIds = $this->getDbNamesAndActorIds($user, $dbNames);
238
239
        if (!$actorIds) {
240
            return [];
241
        }
242
243
        $namespaceCond = 'all' === $namespace
244
            ? ''
245
            : 'AND page_namespace = '.(int)$namespace;
246
        $revDateConditions = $this->getDateConditions($start, $end, $offset, 'revs.', 'rev_timestamp');
247
248
        // Assemble queries.
249
        $queriesBySlice = [];
250
        $projectRepo = $this->caProject->getRepository();
251
        foreach ($dbNames as $dbName) {
252
            if (isset($actorIds[$dbName])) {
253
                $revisionTable = $projectRepo->getTableName($dbName, 'revision');
254
                $pageTable = $projectRepo->getTableName($dbName, 'page');
255
                $commentTable = $projectRepo->getTableName($dbName, 'comment', 'revision');
256
                $actorTable = $projectRepo->getTableName($dbName, 'actor', 'revision');
257
                $tagTable = $projectRepo->getTableName($dbName, 'change_tag');
258
                $tagDefTable = $projectRepo->getTableName($dbName, 'change_tag_def');
259
260
                if ($user->isIpRange()) {
261
                    $ipcTable = $projectRepo->getTableName($dbName, 'ip_changes');
262
                    $ipcJoin = "JOIN $ipcTable ON revs.rev_id = ipc_rev_id";
263
                    $whereClause = "ipc_hex BETWEEN $startIp AND $endIp";
264
                    $username = 'actor_name';
265
                } else {
266
                    $ipcJoin = '';
267
                    $whereClause = 'revs.rev_actor = '.$actorIds[$dbName];
268
                }
269
270
                $slice = $this->getDbList()[$dbName];
271
                $queriesBySlice[$slice][] = "
272
                    SELECT
273
                        '$dbName' AS dbName,
274
                        revs.rev_id AS id,
275
                        revs.rev_timestamp AS timestamp,
276
                        UNIX_TIMESTAMP(revs.rev_timestamp) AS unix_timestamp,
277
                        revs.rev_minor_edit AS minor,
278
                        revs.rev_deleted AS deleted,
279
                        revs.rev_len AS length,
280
                        (CAST(revs.rev_len AS SIGNED) - IFNULL(parentrevs.rev_len, 0)) AS length_change,
281
                        revs.rev_parent_id AS parent_id,
282
                        $username AS username,
283
                        page.page_title,
284
                        page.page_namespace,
285
                        comment_text AS comment,
286
                        (
287
                            SELECT 1
288
                            FROM $tagTable
289
                            WHERE ct_rev_id = revs.rev_id
290
                            AND ct_tag_id = (
291
                                SELECT ctd_id
292
                                FROM $tagDefTable
293
                                WHERE ctd_name = 'mw-reverted'
294
                            )
295
                            LIMIT 1
296
                        ) AS reverted
297
                    FROM $revisionTable AS revs
298
                        $ipcJoin
299
                        JOIN $pageTable AS page ON (rev_page = page_id)
300
                        JOIN $actorTable ON (actor_id = revs.rev_actor)
301
                        LEFT JOIN $revisionTable AS parentrevs ON (revs.rev_parent_id = parentrevs.rev_id)
302
                        LEFT OUTER JOIN $commentTable ON revs.rev_comment_id = comment_id
303
                    WHERE $whereClause
304
                        $namespaceCond
305
                        $revDateConditions";
306
            }
307
        }
308
309
        // Re-assemble into UNIONed queries, executing as many per slice as possible.
310
        $revisions = [];
311
        foreach ($queriesBySlice as $slice => $queries) {
312
            $sql = "SELECT * FROM ((\n" . join("\n) UNION (\n", $queries) . ")) a ORDER BY timestamp DESC LIMIT $limit";
313
            $revisions = array_merge($revisions, $this->executeProjectsQuery($slice, $sql)->fetchAllAssociative());
314
        }
315
316
        // If there are more than $limit results, re-sort by timestamp.
317
        if (count($revisions) > $limit) {
318
            usort($revisions, function ($a, $b) {
319
                if ($a['unix_timestamp'] === $b['unix_timestamp']) {
320
                    return 0;
321
                }
322
                return $a['unix_timestamp'] > $b['unix_timestamp'] ? -1 : 1;
323
            });
324
325
            // Truncate size to $limit.
326
            $revisions = array_slice($revisions, 0, $limit);
327
        }
328
329
        // Cache and return.
330
        return $this->setCache($cacheKey, $revisions);
331
    }
332
}
333