Passed
Push — pageinfo ( 7c1380 )
by MusikAnimal
06:18
created

PageInfoApi::getTopEditorsByEditCount()   A

Complexity

Conditions 3
Paths 3

Size

Total Lines 36
Code Lines 24

Duplication

Lines 0
Ratio 0 %

Importance

Changes 0
Metric Value
cc 3
eloc 24
nc 3
nop 2
dl 0
loc 36
rs 9.536
c 0
b 0
f 0
1
<?php
2
3
declare(strict_types = 1);
4
5
namespace App\Model;
6
7
use App\Exception\BadGatewayException;
8
use App\Helper\AutomatedEditsHelper;
9
use App\Helper\I18nHelper;
10
use App\Repository\PageInfoRepository;
11
use DateTime;
12
use Symfony\Component\DomCrawler\Crawler;
13
use Symfony\Component\HttpKernel\Exception\HttpException;
14
use Symfony\Component\HttpKernel\Exception\ServiceUnavailableHttpException;
15
16
/**
17
 * An PageInfoApi is standalone logic for the PageInfo tool. These methods perform SQL queries
18
 * or make API requests and can be called directly, without any knowledge of the child PageInfo class.
19
 * @see PageInfo
20
 */
21
class PageInfoApi extends Model
22
{
23
    /** @var int Number of days of recent data to show for pageviews. */
24
    public const PAGEVIEWS_OFFSET = 30;
25
26
    protected AutomatedEditsHelper $autoEditsHelper;
27
    protected I18nHelper $i18n;
28
29
    /** @var int Number of revisions that belong to the page. */
30
    protected int $numRevisions;
31
32
    /** @var array Prose stats, with keys 'characters', 'words', 'references', 'unique_references', 'sections'. */
33
    protected array $proseStats;
34
35
    /** @var array Number of categories, templates and files on the page. */
36
    protected array $transclusionData;
37
38
    /** @var array Various statistics about bots that edited the page. */
39
    protected array $bots;
40
41
    /** @var int Number of edits made to the page by bots. */
42
    protected int $botRevisionCount;
43
44
    /** @var int[] Number of in and outgoing links and redirects to the page. */
45
    protected array $linksAndRedirects;
46
47
    /** @var string[]|null Assessments of the page (see Page::getAssessments). */
48
    protected ?array $assessments;
49
50
    /** @var string[] List of Wikidata and Checkwiki errors. */
51
    protected array $bugs;
52
53
    /**
54
     * PageInfoApi constructor.
55
     * @param PageInfoRepository $repository
56
     * @param I18nHelper $i18n
57
     * @param AutomatedEditsHelper $autoEditsHelper
58
     * @param Page $page The page to process.
59
     * @param false|int $start Start date as Unix timestmap.
60
     * @param false|int $end End date as Unix timestamp.
61
     */
62
    public function __construct(
63
        PageInfoRepository $repository,
64
        I18nHelper $i18n,
65
        AutomatedEditsHelper $autoEditsHelper,
66
        Page $page,
67
        $start = false,
68
        $end = false
69
    ) {
70
        $this->repository = $repository;
71
        $this->i18n = $i18n;
72
        $this->autoEditsHelper = $autoEditsHelper;
73
        $this->page = $page;
74
        $this->start = $start;
75
        $this->end = $end;
76
    }
77
78
    /**
79
     * Get the number of revisions belonging to the page.
80
     * @return int
81
     */
82
    public function getNumRevisions(): int
83
    {
84
        if (!isset($this->numRevisions)) {
85
            $this->numRevisions = $this->page->getNumRevisions(null, $this->start, $this->end);
0 ignored issues
show
Bug introduced by
The method getNumRevisions() does not exist on null. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

85
            /** @scrutinizer ignore-call */ 
86
            $this->numRevisions = $this->page->getNumRevisions(null, $this->start, $this->end);

This check looks for calls to methods that do not seem to exist on a given type. It looks for the method on the type itself as well as in inherited classes or implemented interfaces.

This is most likely a typographical error or the method has been renamed.

Loading history...
86
        }
87
        return $this->numRevisions;
88
    }
89
90
    /**
91
     * Are there more revisions than we should process, based on the config?
92
     * @return bool
93
     */
94
    public function tooManyRevisions(): bool
95
    {
96
        return $this->repository->getMaxPageRevisions() > 0 &&
0 ignored issues
show
Bug introduced by
The method getMaxPageRevisions() does not exist on App\Repository\Repository. It seems like you code against a sub-type of App\Repository\Repository such as App\Repository\PageInfoRepository. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

96
        return $this->repository->/** @scrutinizer ignore-call */ getMaxPageRevisions() > 0 &&
Loading history...
97
            $this->getNumRevisions() > $this->repository->getMaxPageRevisions();
98
    }
99
100
    /**
101
     * Get various basic info used in the API, including the number of revisions, unique authors, initial author
102
     * and edit count of the initial author. This is combined into one query for better performance. Caching is
103
     * intentionally disabled, because using the gadget, this will get hit for a different page constantly, where
104
     * the likelihood of cache benefiting us is slim.
105
     * @return string[]|false false if the page was not found.
106
     */
107
    public function getBasicEditingInfo()
108
    {
109
        return $this->repository->getBasicEditingInfo($this->page);
0 ignored issues
show
Bug introduced by
The method getBasicEditingInfo() does not exist on App\Repository\Repository. It seems like you code against a sub-type of App\Repository\Repository such as App\Repository\PageInfoRepository. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

109
        return $this->repository->/** @scrutinizer ignore-call */ getBasicEditingInfo($this->page);
Loading history...
110
    }
111
112
    /**
113
     * Get the top editors to the page by edit count.
114
     * @param int $limit Default 20, maximum 1,000.
115
     * @param bool $noBots Set to non-false to exclude bots from the result.
116
     * @return array
117
     */
118
    public function getTopEditorsByEditCount(int $limit = 20, bool $noBots = false): array
119
    {
120
        // Quick cache, valid only for the same request.
121
        static $topEditors = null;
122
        if (null !== $topEditors) {
123
            return $topEditors;
124
        }
125
126
        $rows = $this->repository->getTopEditorsByEditCount(
0 ignored issues
show
Bug introduced by
The method getTopEditorsByEditCount() does not exist on App\Repository\Repository. It seems like you code against a sub-type of App\Repository\Repository such as App\Repository\PageInfoRepository. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

126
        /** @scrutinizer ignore-call */ 
127
        $rows = $this->repository->getTopEditorsByEditCount(
Loading history...
127
            $this->page,
128
            $this->start,
129
            $this->end,
130
            min($limit, 1000),
131
            $noBots
132
        );
133
134
        $topEditors = [];
135
        $rank = 0;
136
        foreach ($rows as $row) {
137
            $topEditors[] = [
138
                'rank' => ++$rank,
139
                'username' => $row['username'],
140
                'count' => $row['count'],
141
                'minor' => $row['minor'],
142
                'first_edit' => [
143
                    'id' => $row['first_revid'],
144
                    'timestamp' => $row['first_timestamp'],
145
                ],
146
                'latest_edit' => [
147
                    'id' => $row['latest_revid'],
148
                    'timestamp' => $row['latest_timestamp'],
149
                ],
150
            ];
151
        }
152
153
        return $topEditors;
154
    }
155
156
    /**
157
     * Get prose and reference information.
158
     * @return array|null With keys 'characters', 'words', 'references', 'unique_references', or null on failure.
159
     */
160
    public function getProseStats(): ?array
161
    {
162
        if (isset($this->proseStats)) {
163
            return $this->proseStats;
164
        }
165
166
        $datetime = is_int($this->end) ? new DateTime("@$this->end") : null;
167
168
        try {
169
            $html = $this->page->getHTMLContent($datetime);
170
        } catch (BadGatewayException $e) {
171
            // Prose stats are non-critical, so handle the BadGatewayException gracefully in the views.
172
            return null;
173
        }
174
175
        $crawler = new Crawler($html);
176
        $refs = $crawler->filter('[typeof~="mw:Extension/ref"]');
177
178
        [$bytes, $chars, $words] = $this->countCharsAndWords($crawler);
179
180
        $refContent = [];
181
        $refs->each(function ($ref) use (&$refContent): void {
182
            $refContent[] = $ref->text();
183
        });
184
        $uniqueRefs = count(array_unique($refContent));
185
186
        $this->proseStats = [
187
            'bytes' => $bytes,
188
            'characters' => $chars,
189
            'words' => $words,
190
            'references' => $refs->count(),
191
            'unique_references' => $uniqueRefs,
192
            'sections' => $crawler->filter('section')->count(),
193
        ];
194
        return $this->proseStats;
195
    }
196
197
    /**
198
     * Count the number of byes, characters and words of the plain text.
199
     * @param Crawler $crawler
200
     * @return array [num bytes, num chars, num words]
201
     */
202
    private function countCharsAndWords(Crawler $crawler): array
203
    {
204
        $totalBytes = 0;
205
        $totalChars = 0;
206
        $totalWords = 0;
207
        $paragraphs = $crawler->filter('section > p');
208
209
        // Remove templates, TemplateStyles, math and reference tags.
210
        $crawler->filter(implode(',', [
211
            '#coordinates',
212
            '[class*="emplate"]',
213
            '[typeof~="mw:Extension/templatestyles"]',
214
            '[typeof~="mw:Extension/math"]',
215
            '[typeof~="mw:Extension/ref"]',
216
        ]))->each(function (Crawler $subCrawler) {
217
            foreach ($subCrawler as $subNode) {
218
                $subNode->parentNode->removeChild($subNode);
219
            }
220
        });
221
222
        $paragraphs->each(function ($node) use (&$totalBytes, &$totalChars, &$totalWords): void {
223
            /** @var Crawler $node */
224
            $text = $node->text();
225
            $totalBytes += strlen($text);
226
            $totalChars += mb_strlen($text);
227
            $totalWords += count(explode(' ', $text));
228
        });
229
230
        return [$totalBytes, $totalChars, $totalWords];
231
    }
232
233
    /**
234
     * Get the page assessments of the page.
235
     * @see https://www.mediawiki.org/wiki/Special:MyLanguage/Extension:PageAssessments
236
     * @return string[]|null null if unsupported.
237
     * @codeCoverageIgnore
238
     */
239
    public function getAssessments(): ?array
240
    {
241
        if (!isset($this->assessments)) {
242
            $this->assessments = $this->page
243
                ->getProject()
244
                ->getPageAssessments()
245
                ->getAssessments($this->page);
0 ignored issues
show
Bug introduced by
It seems like $this->page can also be of type null; however, parameter $page of App\Model\PageAssessments::getAssessments() does only seem to accept App\Model\Page, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

245
                ->getAssessments(/** @scrutinizer ignore-type */ $this->page);
Loading history...
246
        }
247
        return $this->assessments;
248
    }
249
250
    /**
251
     * Get the list of page's wikidata and Checkwiki errors.
252
     * @see Page::getErrors()
253
     * @return string[]
254
     */
255
    public function getBugs(): array
256
    {
257
        if (!isset($this->bugs)) {
258
            $this->bugs = $this->page->getErrors();
259
        }
260
        return $this->bugs;
261
    }
262
263
    /**
264
     * Get the number of wikidata nad CheckWiki errors.
265
     * @return int
266
     */
267
    public function numBugs(): int
268
    {
269
        return count($this->getBugs());
270
    }
271
272
    /**
273
     * Generate the data structure that will used in the PageInfo API response.
274
     * @param Project $project
275
     * @param Page $page
276
     * @return array
277
     * @codeCoverageIgnore
278
     */
279
    public function getPageInfoApiData(Project $project, Page $page): array
280
    {
281
        $data = [
282
            'project' => $project->getDomain(),
283
            'page' => $page->getTitle(),
284
            'watchers' => (int) $page->getWatchers(),
285
            'pageviews' => $page->getLatestPageviews(),
286
            'pageviews_offset' => self::PAGEVIEWS_OFFSET,
287
        ];
288
289
        $info = null;
0 ignored issues
show
Unused Code introduced by
The assignment to $info is dead and can be removed.
Loading history...
290
291
        try {
292
            $info = $this->repository->getBasicEditingInfo($page);
293
        } catch (ServiceUnavailableHttpException $e) {
294
            // No more open database connections.
295
            $data['error'] = 'Unable to fetch revision data. Please try again later.';
296
        } catch (HttpException $e) {
297
            /**
298
             * The query most likely exceeded the maximum query time,
299
             * so we'll abort and give only info retrieved by the API.
300
             */
301
            $data['error'] = 'Unable to fetch revision data. The query may have timed out.';
302
        }
303
304
        if ($info) {
305
            $creationDateTime = DateTime::createFromFormat('YmdHis', $info['created_at']);
306
            $modifiedDateTime = DateTime::createFromFormat('YmdHis', $info['modified_at']);
307
            $secsSinceLastEdit = (new DateTime)->getTimestamp() - $modifiedDateTime->getTimestamp();
308
309
            // Some wikis (such foundation.wikimedia.org) may be missing the creation date.
310
            $creationDateTime = false === $creationDateTime
311
                ? null
312
                : $creationDateTime->format('Y-m-d');
313
314
            $assessment = $page->getProject()
315
                ->getPageAssessments()
316
                ->getAssessment($page);
317
318
            $data = array_merge($data, [
319
                'revisions' => (int) $info['num_edits'],
320
                'editors' => (int) $info['num_editors'],
321
                'ip_edits' => (int) $info['ip_edits'],
322
                'minor_edits' => (int) $info['minor_edits'],
323
                'author' => $info['author'],
324
                'author_editcount' => null === $info['author_editcount'] ? null : (int) $info['author_editcount'],
325
                'created_at' => $creationDateTime,
326
                'created_rev_id' => $info['created_rev_id'],
327
                'modified_at' => $modifiedDateTime->format('Y-m-d H:i'),
328
                'secs_since_last_edit' => $secsSinceLastEdit,
329
                'last_edit_id' => (int) $info['modified_rev_id'],
330
                'assessment' => $assessment,
331
            ]);
332
        }
333
334
        return $data;
335
    }
336
337
    /************************ Link statistics ************************/
338
339
    /**
340
     * Get the number of external links on the page.
341
     * @return int
342
     */
343
    public function linksExtCount(): int
344
    {
345
        return $this->getLinksAndRedirects()['links_ext_count'];
346
    }
347
348
    /**
349
     * Get the number of incoming links to the page.
350
     * @return int
351
     */
352
    public function linksInCount(): int
353
    {
354
        return $this->getLinksAndRedirects()['links_in_count'];
355
    }
356
357
    /**
358
     * Get the number of outgoing links from the page.
359
     * @return int
360
     */
361
    public function linksOutCount(): int
362
    {
363
        return $this->getLinksAndRedirects()['links_out_count'];
364
    }
365
366
    /**
367
     * Get the number of redirects to the page.
368
     * @return int
369
     */
370
    public function redirectsCount(): int
371
    {
372
        return $this->getLinksAndRedirects()['redirects_count'];
373
    }
374
375
    /**
376
     * Get the number of external, incoming and outgoing links, along with the number of redirects to the page.
377
     * @return int[]
378
     * @codeCoverageIgnore
379
     */
380
    private function getLinksAndRedirects(): array
381
    {
382
        if (!isset($this->linksAndRedirects)) {
383
            $this->linksAndRedirects = $this->page->countLinksAndRedirects();
384
        }
385
        return $this->linksAndRedirects;
386
    }
387
388
    /**
389
     * Fetch transclusion data (categories, templates and files) that are on the page.
390
     * @return array With keys 'categories', 'templates' and 'files'.
391
     */
392
    public function getTransclusionData(): array
393
    {
394
        if (!isset($this->transclusionData)) {
395
            $this->transclusionData = $this->repository->getTransclusionData($this->page);
0 ignored issues
show
Bug introduced by
The method getTransclusionData() does not exist on App\Repository\Repository. It seems like you code against a sub-type of App\Repository\Repository such as App\Repository\PageInfoRepository. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

395
            /** @scrutinizer ignore-call */ 
396
            $this->transclusionData = $this->repository->getTransclusionData($this->page);
Loading history...
396
        }
397
        return $this->transclusionData;
398
    }
399
400
    /**
401
     * Get the number of categories that are on the page.
402
     * @return int
403
     */
404
    public function getNumCategories(): int
405
    {
406
        return $this->getTransclusionData()['categories'];
407
    }
408
409
    /**
410
     * Get the number of templates that are on the page.
411
     * @return int
412
     */
413
    public function getNumTemplates(): int
414
    {
415
        return $this->getTransclusionData()['templates'];
416
    }
417
418
    /**
419
     * Get the number of files that are on the page.
420
     * @return int
421
     */
422
    public function getNumFiles(): int
423
    {
424
        return $this->getTransclusionData()['files'];
425
    }
426
427
    /************************ Bot statistics ************************/
428
429
    /**
430
     * Number of edits made to the page by current or former bots.
431
     * @param string[][] $bots Used only in unit tests, where we supply mock data for the bots that will get processed.
432
     * @return int
433
     */
434
    public function getBotRevisionCount(?array $bots = null): int
435
    {
436
        if (isset($this->botRevisionCount)) {
437
            return $this->botRevisionCount;
438
        }
439
440
        if (null === $bots) {
441
            $bots = $this->getBots();
442
        }
443
444
        $count = 0;
445
446
        foreach (array_values($bots) as $data) {
447
            $count += $data['count'];
448
        }
449
450
        $this->botRevisionCount = $count;
451
        return $count;
452
    }
453
454
    /**
455
     * Get and set $this->bots about bots that edited the page. This is done separately from the main query because
456
     * we use this information when computing the top 10 editors in PageInfo, where we don't want to include bots.
457
     * @return array
458
     */
459
    public function getBots(): array
460
    {
461
        if (isset($this->bots)) {
462
            return $this->bots;
463
        }
464
465
        // Parse the bot edits.
466
        $this->bots = [];
467
468
        $limit = $this->tooManyRevisions() ? $this->repository->getMaxPageRevisions() : null;
469
470
        $botData = $this->repository->getBotData($this->page, $this->start, $this->end, $limit);
0 ignored issues
show
Bug introduced by
The method getBotData() does not exist on App\Repository\Repository. It seems like you code against a sub-type of App\Repository\Repository such as App\Repository\PageInfoRepository. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

470
        /** @scrutinizer ignore-call */ 
471
        $botData = $this->repository->getBotData($this->page, $this->start, $this->end, $limit);
Loading history...
471
        while ($bot = $botData->fetchAssociative()) {
472
            $this->bots[$bot['username']] = [
473
                'count' => (int)$bot['count'],
474
                'current' => '1' === $bot['current'],
475
            ];
476
        }
477
478
        // Sort by edit count.
479
        uasort($this->bots, function ($a, $b) {
480
            return $b['count'] - $a['count'];
481
        });
482
483
        return $this->bots;
484
    }
485
486
    /**
487
     * Get the number of bots that edited the page.
488
     * @return int
489
     */
490
    public function getNumBots(): int
491
    {
492
        return count($this->getBots());
493
    }
494
495
    /**
496
     * Get counts of (semi-)automated tools used to edit the page.
497
     * @return array
498
     */
499
    public function getAutoEditsCounts(): array
500
    {
501
        return $this->repository->getAutoEditsCounts($this->page, $this->start, $this->end);
0 ignored issues
show
Bug introduced by
The method getAutoEditsCounts() does not exist on App\Repository\Repository. It seems like you code against a sub-type of App\Repository\Repository such as App\Repository\PageInfoRepository. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

501
        return $this->repository->/** @scrutinizer ignore-call */ getAutoEditsCounts($this->page, $this->start, $this->end);
Loading history...
502
    }
503
}
504