Passed
Push — master ( 6a210c...1c40ea )
by MusikAnimal
07:35
created

AutoEditsRepository   C

Complexity

Total Complexity 54

Size/Duplication

Total Lines 517
Duplicated Lines 0 %

Importance

Changes 0
Metric Value
eloc 212
dl 0
loc 517
rs 6.4799
c 0
b 0
f 0
wmc 54

10 Methods

Rating   Name   Duplication   Size   Complexity  
A getNonAutomatedEdits() 0 56 5
A getAutomatedCountsSql() 0 43 4
A usesSingleTag() 0 3 1
A getInnerAutomatedCountsSql() 0 41 5
A getToolCounts() 0 39 4
A getTools() 0 21 6
A countAutomatedEdits() 0 52 5
B getAutomatedEdits() 0 84 9
A getTags() 0 26 4
B getToolRegexAndTags() 0 38 11

How to fix   Complexity   

Complex Class

Complex classes like AutoEditsRepository often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

While breaking up the class, it is a good idea to analyze how other classes use AutoEditsRepository, and based on these observations, apply Extract Interface, too.

1
<?php
2
/**
3
 * This file contains only the AutoEditsRepository class.
4
 */
5
6
declare(strict_types = 1);
7
8
namespace AppBundle\Repository;
9
10
use AppBundle\Model\Project;
11
use AppBundle\Model\User;
12
13
/**
14
 * AutoEditsRepository is responsible for retrieving data from the database
15
 * about the automated edits made by a user.
16
 * @codeCoverageIgnore
17
 */
18
class AutoEditsRepository extends UserRepository
19
{
20
    /** @var array List of automated tools, used for fetching the tool list and filtering it. */
21
    private $aeTools;
22
23
    /**
24
     * Method to give the repository access to the AutomatedEditsHelper and fetch the list of semi-automated tools.
25
     * @param Project $project
26
     * @param int|string $namespace Namespace ID or 'all'.
27
     * @return array
28
     */
29
    public function getTools(Project $project, $namespace = 'all'): array
30
    {
31
        if (!isset($this->aeTools)) {
32
            $this->aeTools = $this->container
33
                ->get('app.automated_edits_helper')
34
                ->getTools($project);
35
        }
36
37
        if ('all' !== $namespace) {
38
            // Limit by namespace.
39
            return array_filter($this->aeTools, function (array $tool) use ($namespace) {
40
                return empty($tool['namespaces']) ||
41
                    in_array((int)$namespace, $tool['namespaces']) ||
42
                    (
43
                        1 === $namespace % 2 &&
44
                        isset($tool['talk_namespaces'])
45
                    );
46
            });
47
        }
48
49
        return $this->aeTools;
50
    }
51
52
    /**
53
     * Is the tag for given tool intended to be counted by itself? For instance,
54
     * when counting Rollback edits we don't want to also count Huggle edits (which are tagged as Rollback).
55
     * @param Project $project
56
     * @param string|null $tool
57
     * @return bool
58
     */
59
    private function usesSingleTag(Project $project, ?string $tool): bool
60
    {
61
        return isset($this->getTools($project)[$tool]['single_tag']);
62
    }
63
64
    /**
65
     * Get the number of edits this user made using semi-automated tools.
66
     * @param Project $project
67
     * @param User $user
68
     * @param string|int $namespace Namespace ID or 'all'
69
     * @param string $start Start date in a format accepted by strtotime()
70
     * @param string $end End date in a format accepted by strtotime()
71
     * @return int Result of query, see below.
72
     */
73
    public function countAutomatedEdits(Project $project, User $user, $namespace = 'all', $start = '', $end = ''): int
74
    {
75
        $cacheKey = $this->getCacheKey(func_get_args(), 'user_autoeditcount');
76
        if ($this->cache->hasItem($cacheKey)) {
77
            return $this->cache->getItem($cacheKey)->get();
78
        }
79
80
        [$condBegin, $condEnd] = $this->getRevTimestampConditions($start, $end);
81
82
        // Get the combined regex and tags for the tools
83
        [$regex, $tagIds] = $this->getToolRegexAndTags($project, false, null, $namespace);
84
85
        [$pageJoin, $condNamespace] = $this->getPageAndNamespaceSql($project, $namespace);
86
87
        $revisionTable = $project->getTableName('revision');
88
        $commentTable = $project->getTableName('comment');
89
        $tagTable = $project->getTableName('change_tag');
90
        $commentJoin = '';
91
        $tagJoin = '';
92
93
        $params = [];
94
95
        // Build SQL for detecting autoedits via regex and/or tags
96
        $condTools = [];
97
        if ('' != $regex) {
98
            $commentJoin = "LEFT OUTER JOIN $commentTable ON rev_comment_id = comment_id";
99
            $condTools[] = "comment_text REGEXP :tools";
100
            $params['tools'] = $regex;
101
        }
102
        if ('' != $tagIds) {
103
            $tagJoin = "LEFT OUTER JOIN $tagTable ON ct_rev_id = rev_id";
104
            $condTools[] = "ct_tag_id IN ($tagIds)";
105
        }
106
        $condTool = 'AND (' . implode(' OR ', $condTools) . ')';
107
        $userClause = $user->isAnon() ? 'rev_user_text = :username' : 'rev_user = :userId';
108
109
        $sql = "SELECT COUNT(DISTINCT(rev_id))
110
                FROM $revisionTable
111
                $pageJoin
112
                $commentJoin
113
                $tagJoin
114
                WHERE $userClause
115
                $condNamespace
116
                $condTool
117
                $condBegin
118
                $condEnd";
119
120
        $resultQuery = $this->executeQuery($sql, $project, $user, $namespace, $start, $end, $params);
121
        $result = (int)$resultQuery->fetchColumn();
122
123
        // Cache and return.
124
        return $this->setCache($cacheKey, $result);
125
    }
126
127
    /**
128
     * Get non-automated contributions for the given user.
129
     * @param Project $project
130
     * @param User $user
131
     * @param string|int $namespace Namespace ID or 'all'.
132
     * @param string $start Start date in a format accepted by strtotime().
133
     * @param string $end End date in a format accepted by strtotime().
134
     * @param int $offset Used for pagination, offset results by N edits.
135
     * @return string[] Result of query, with columns 'page_title', 'page_namespace', 'rev_id', 'timestamp', 'minor',
136
     *   'length', 'length_change', 'comment'.
137
     */
138
    public function getNonAutomatedEdits(
139
        Project $project,
140
        User $user,
141
        $namespace = 'all',
142
        $start = '',
143
        $end = '',
144
        int $offset = 0
145
    ): array {
146
        $cacheKey = $this->getCacheKey(func_get_args(), 'user_nonautoedits');
147
        if ($this->cache->hasItem($cacheKey)) {
148
            return $this->cache->getItem($cacheKey)->get();
149
        }
150
151
        [$condBegin, $condEnd] = $this->getRevTimestampConditions($start, $end, 'revs.');
152
153
        // Get the combined regex and tags for the tools
154
        [$regex, $tagIds] = $this->getToolRegexAndTags($project, false, null, $namespace);
155
156
        $pageTable = $project->getTableName('page');
157
        $revisionTable = $project->getTableName('revision');
158
        $commentTable = $project->getTableName('comment');
159
        $tagTable = $project->getTableName('change_tag');
160
        $condNamespace = 'all' === $namespace ? '' : 'AND page_namespace = :namespace';
161
        $condTag = '' != $tagIds ? "AND NOT EXISTS (SELECT 1 FROM $tagTable
162
            WHERE ct_rev_id = revs.rev_id AND ct_tag_id IN ($tagIds))" : '';
163
        $userClause = $user->isAnon() ? 'rev_user_text = :username' : 'rev_user = :userId';
164
        $sql = "SELECT
165
                    page_title,
166
                    page_namespace,
167
                    revs.rev_id AS rev_id,
168
                    revs.rev_timestamp AS timestamp,
169
                    revs.rev_minor_edit AS minor,
170
                    revs.rev_len AS length,
171
                    (CAST(revs.rev_len AS SIGNED) - IFNULL(parentrevs.rev_len, 0)) AS length_change,
172
                    comment_text AS comment
173
                FROM $pageTable
174
                JOIN $revisionTable AS revs ON (page_id = revs.rev_page)
175
                LEFT JOIN $revisionTable AS parentrevs ON (revs.rev_parent_id = parentrevs.rev_id)
176
                LEFT OUTER JOIN $commentTable ON (revs.rev_comment_id = comment_id)
177
                WHERE revs.{$userClause}
178
                AND revs.rev_timestamp > 0
179
                AND comment_text NOT RLIKE :tools
180
                $condTag
181
                $condBegin
182
                $condEnd
183
                $condNamespace
184
                GROUP BY revs.rev_id
185
                ORDER BY revs.rev_timestamp DESC
186
                LIMIT 50
187
                OFFSET $offset";
188
189
        $resultQuery = $this->executeQuery($sql, $project, $user, $namespace, $start, $end, ['tools' => $regex]);
190
        $result = $resultQuery->fetchAll();
191
192
        // Cache and return.
193
        return $this->setCache($cacheKey, $result);
194
    }
195
196
    /**
197
     * Get (semi-)automated contributions for the given user, and optionally for a given tool.
198
     * @param Project $project
199
     * @param User $user
200
     * @param string|int $namespace Namespace ID or 'all'.
201
     * @param string $start Start date in a format accepted by strtotime().
202
     * @param string $end End date in a format accepted by strtotime().
203
     * @param string|null $tool Only get edits made with this tool. Must match the keys in semi_automated.yml.
204
     * @param int $offset Used for pagination, offset results by N edits.
205
     * @return string[] Result of query, with columns 'page_title', 'page_namespace', 'rev_id', 'timestamp', 'minor',
206
     *   'length', 'length_change', 'comment'.
207
     */
208
    public function getAutomatedEdits(
209
        Project $project,
210
        User $user,
211
        $namespace = 'all',
212
        $start = '',
213
        $end = '',
214
        ?string $tool = null,
215
        int $offset = 0
216
    ): array {
217
        $cacheKey = $this->getCacheKey(func_get_args(), 'user_autoedits');
218
        if ($this->cache->hasItem($cacheKey)) {
219
            return $this->cache->getItem($cacheKey)->get();
220
        }
221
222
        [$condBegin, $condEnd] = $this->getRevTimestampConditions($start, $end, 'revs.');
223
224
        // In this case there is a slight performance improvement we can make if we're not given a start date.
225
        if ('' == $condBegin && '' == $condEnd) {
226
            $condBegin = 'AND revs.rev_timestamp > 0';
227
        }
228
229
        // Get the combined regex and tags for the tools
230
        [$regex, $tagIds] = $this->getToolRegexAndTags($project, false, $tool);
231
232
        $pageTable = $project->getTableName('page');
233
        $revisionTable = $project->getTableName('revision');
234
        $commentTable = $project->getTableName('comment');
235
        $tagTable = $project->getTableName('change_tag');
236
        $condNamespace = 'all' === $namespace ? '' : 'AND page_namespace = :namespace';
237
        $userClause = $user->isAnon() ? 'rev_user_text = :username' : 'rev_user = :userId';
238
        $tagJoin = '';
239
        $condsTool = [];
240
241
        if ('' != $regex) {
242
            $condsTool[] = 'comment_text RLIKE :tools';
243
        }
244
245
        if ('' != $tagIds) {
246
            $tagJoin = "LEFT OUTER JOIN $tagTable ON (ct_rev_id = revs.rev_id)";
247
            if ($this->usesSingleTag($project, $tool)) {
248
                // Only show edits made with the tool that don't overlap with other tools.
249
                // For instance, Huggle edits are also tagged as Rollback, but when viewing
250
                // Rollback edits we don't want to show Huggle edits.
251
                $condsTool[] = "
252
                    EXISTS (
253
                        SELECT COUNT(ct_tag_id) AS tag_count
254
                        FROM $tagTable
255
                        WHERE ct_rev_id = revs.rev_id
256
                        HAVING tag_count = 1 AND ct_tag_id = $tagIds
257
                    )";
258
            } else {
259
                $condsTool[] = "ct_tag_id IN ($tagIds)";
260
            }
261
        }
262
263
        $sql = "SELECT
264
                    page_title,
265
                    page_namespace,
266
                    revs.rev_id AS rev_id,
267
                    revs.rev_timestamp AS timestamp,
268
                    revs.rev_minor_edit AS minor,
269
                    revs.rev_len AS length,
270
                    (CAST(revs.rev_len AS SIGNED) - IFNULL(parentrevs.rev_len, 0)) AS length_change,
271
                    comment_text AS comment
272
                FROM $pageTable
273
                JOIN $revisionTable AS revs ON (page_id = revs.rev_page)
274
                LEFT JOIN $revisionTable AS parentrevs ON (revs.rev_parent_id = parentrevs.rev_id)
275
                LEFT OUTER JOIN $commentTable ON (revs.rev_comment_id = comment_id)
276
                $tagJoin
277
                WHERE revs.{$userClause}
278
                $condBegin
279
                $condEnd
280
                $condNamespace
281
                AND (".implode(' OR ', $condsTool).")
282
                GROUP BY revs.rev_id
283
                ORDER BY revs.rev_timestamp DESC
284
                LIMIT 50
285
                OFFSET $offset";
286
287
        $resultQuery = $this->executeQuery($sql, $project, $user, $namespace, $start, $end, ['tools' => $regex]);
288
        $result = $resultQuery->fetchAll();
289
290
        // Cache and return.
291
        return $this->setCache($cacheKey, $result);
292
    }
293
294
    /**
295
     * Get counts of known automated tools used by the given user.
296
     * @param Project $project
297
     * @param User $user
298
     * @param string|int $namespace Namespace ID or 'all'.
299
     * @param string $start Start date in a format accepted by strtotime().
300
     * @param string $end End date in a format accepted by strtotime().
301
     * @return string[] Each tool that they used along with the count and link:
302
     *                  [
303
     *                      'Twinkle' => [
304
     *                          'count' => 50,
305
     *                          'link' => 'Wikipedia:Twinkle',
306
     *                      ],
307
     *                  ]
308
     */
309
    public function getToolCounts(
310
        Project $project,
311
        User $user,
312
        $namespace = 'all',
313
        $start = '',
314
        $end = ''
315
    ): array {
316
        $cacheKey = $this->getCacheKey(func_get_args(), 'user_autotoolcounts');
317
        if ($this->cache->hasItem($cacheKey)) {
318
            return $this->cache->getItem($cacheKey)->get();
319
        }
320
321
        $sql = $this->getAutomatedCountsSql($project, $user, $namespace, $start, $end);
322
        $resultQuery = $this->executeQuery($sql, $project, $user, $namespace, $start, $end);
323
324
        $tools = $this->getTools($project, $namespace);
325
326
        // handling results
327
        $results = [];
328
329
        while ($row = $resultQuery->fetch()) {
330
            // Only track tools that they've used at least once
331
            $tool = $row['toolname'];
332
            if ($row['count'] > 0) {
333
                $results[$tool] = [
334
                    'link' => $tools[$tool]['link'],
335
                    'label' => $tools[$tool]['label'] ?? $tool,
336
                    'count' => $row['count'],
337
                ];
338
            }
339
        }
340
341
        // Sort the array by count
342
        uasort($results, function ($a, $b) {
343
            return $b['count'] - $a['count'];
344
        });
345
346
        // Cache and return.
347
        return $this->setCache($cacheKey, $results);
348
    }
349
350
    /**
351
     * Get SQL for getting counts of known automated tools used by the given user.
352
     * @see self::getAutomatedCounts()
353
     * @param Project $project
354
     * @param User $user
355
     * @param string|int $namespace Namespace ID or 'all'.
356
     * @param string $start Start date in a format accepted by strtotime()
357
     * @param string $end End date in a format accepted by strtotime()
358
     * @return string The SQL.
359
     */
360
    private function getAutomatedCountsSql(Project $project, User $user, $namespace, $start, $end): string
361
    {
362
        [$condBegin, $condEnd] = $this->getRevTimestampConditions($start, $end);
363
364
        // Load the semi-automated edit types.
365
        $tools = $this->getTools($project, $namespace);
366
367
        // Create a collection of queries that we're going to run.
368
        $queries = [];
369
370
        $revisionTable = $project->getTableName('revision');
371
372
        [$pageJoin, $condNamespace] = $this->getPageAndNamespaceSql($project, $namespace);
373
        $userClause = $user->isAnon() ? 'rev_user_text = :username' : 'rev_user = :userId';
374
375
        $conn = $this->getProjectsConnection();
376
377
        foreach ($tools as $toolName => $values) {
378
            [$condTool, $commentJoin, $tagJoin] = $this->getInnerAutomatedCountsSql($project, $values);
379
380
            $toolName = $conn->quote($toolName, \PDO::PARAM_STR);
381
382
            // No regex or tag provided for this tool. This can happen for tag-only tools that are in the global
383
            // configuration, but no local tag exists on the said project.
384
            if ('' === $condTool) {
385
                continue;
386
            }
387
388
            $queries[] .= "
389
                SELECT $toolName AS toolname, COUNT(DISTINCT(rev_id)) AS count
390
                FROM $revisionTable
391
                $pageJoin
392
                $commentJoin
393
                $tagJoin
394
                WHERE $userClause
395
                AND $condTool
396
                $condNamespace
397
                $condBegin
398
                $condEnd";
399
        }
400
401
        // Combine to one big query.
402
        return implode(' UNION ', $queries);
403
    }
404
405
    /**
406
     * Get some of the inner SQL for self::getAutomatedCountsSql().
407
     * @param Project $project
408
     * @param string[] $values Values as defined in semi_automated.yml
409
     * @return string[] [Equality clause, JOIN clause]
410
     */
411
    private function getInnerAutomatedCountsSql(Project $project, array $values): array
412
    {
413
        $conn = $this->getProjectsConnection();
414
        $commentJoin = '';
415
        $tagJoin = '';
416
        $condTool = '';
417
418
        if (isset($values['regex'])) {
419
            $commentTable = $project->getTableName('comment');
420
            $commentJoin = "LEFT OUTER JOIN $commentTable ON rev_comment_id = comment_id";
421
            $regex = $conn->quote($values['regex'], \PDO::PARAM_STR);
422
            $condTool = "comment_text REGEXP $regex";
423
        }
424
        if (isset($values['tag']) && isset($this->getTags($project)[$values['tag']])) {
425
            $tagTable = $project->getTableName('change_tag');
426
            $tagJoin = "LEFT OUTER JOIN $tagTable ON ct_rev_id = rev_id";
427
428
            $tagId = $this->getTags($project)[$values['tag']];
429
430
            // This ensures we count only edits made with the given tool, and not other
431
            // edits that incidentally have the same tag. For instance, Huggle edits
432
            // are also tagged as Rollback, but we want to make them mutually exclusive.
433
            $tagClause = "
434
                EXISTS (
435
                    SELECT COUNT(ct_tag_id) AS tag_count
436
                    FROM $tagTable
437
                    WHERE ct_rev_id = rev_id
438
                    HAVING tag_count = 1 AND ct_tag_id = $tagId
439
                )";
440
441
            // Use tags in addition to the regex clause, if already present.
442
            // Tags are more reliable but may not be present for edits made with
443
            // older versions of the tool, before it started adding tags.
444
            if ('' === $condTool) {
445
                $condTool = $tagClause;
446
            } else {
447
                $condTool = '(' . $condTool . " OR $tagClause)";
448
            }
449
        }
450
451
        return [$condTool, $commentJoin, $tagJoin];
452
    }
453
454
    /**
455
     * Get the combined regex and tags for all semi-automated tools, or the given tool, ready to be used in a query.
456
     * @param Project $project
457
     * @param bool $nonAutoEdits Set to true to exclude tools with the 'contribs' flag.
458
     * @param string|null $tool
459
     * @param int|string|null $namespace Tools only used in given namespace ID, or 'all' for all namespaces.
460
     * @return array In the format: ['combined|regex', [1,2,3]] where the second element contains the tag IDs.
461
     */
462
    private function getToolRegexAndTags(
463
        Project $project,
464
        bool $nonAutoEdits = false,
465
        ?string $tool = null,
466
        $namespace = null
467
    ): array {
468
        $tools = $this->getTools($project);
469
        $regexes = [];
470
        $allTagIds = $this->getTags($project);
471
        $tagIds = [];
472
473
        if ('' != $tool) {
474
            $tools = [$tools[$tool]];
475
        }
476
477
        foreach (array_values($tools) as $values) {
478
            if ($nonAutoEdits && isset($values['contribs'])) {
479
                continue;
480
            }
481
482
            if (is_numeric($namespace) &&
483
                !empty($values['namespaces']) &&
484
                !in_array((int)$namespace, $values['namespaces'])
485
            ) {
486
                continue;
487
            }
488
489
            if (isset($values['regex'])) {
490
                $regexes[] = $values['regex'];
491
            }
492
            if (isset($values['tag']) && isset($allTagIds[$values['tag']])) {
493
                $tagIds[] = $allTagIds[$values['tag']];
494
            }
495
        }
496
497
        return [
498
            implode('|', $regexes),
499
            implode(',', $tagIds),
500
        ];
501
    }
502
503
    /**
504
     * Get the IDs of tags for given Project, which are used in the IN clauses of other queries above.
505
     * This join decomposition is actually faster than JOIN'ing on change_tag_def all in one query.
506
     * @param Project $project
507
     * @return int[]
508
     */
509
    public function getTags(Project $project): array
510
    {
511
        // Set up cache.
512
        $cacheKey = $this->getCacheKey(func_get_args(), 'ae_tag_ids');
513
        if ($this->cache->hasItem($cacheKey)) {
514
            return $this->cache->getItem($cacheKey)->get();
515
        }
516
517
        $conn = $this->getProjectsConnection();
518
519
        // Get all tag values.
520
        $tags = [];
521
        foreach (array_values($this->getTools($project)) as $values) {
522
            if (isset($values['tag'])) {
523
                $tags[] = $conn->quote($values['tag'], \PDO::PARAM_STR);
524
            }
525
        }
526
527
        $tags = implode(',', $tags);
528
        $tagDefTable = $project->getTableName('change_tag_def');
529
        $sql = "SELECT ctd_name, ctd_id FROM $tagDefTable
530
                WHERE ctd_name IN ($tags)";
531
        $result = $this->executeProjectsQuery($sql)->fetchAll(\PDO::FETCH_KEY_PAIR);
532
533
        // Cache and return.
534
        return $this->setCache($cacheKey, $result);
535
    }
536
}
537