Passed
Pull Request — main (#426)
by MusikAnimal
08:27 queued 04:14
created

ArticleInfoApi   B

Complexity

Total Complexity 46

Size/Duplication

Total Lines 459
Duplicated Lines 0 %

Importance

Changes 0
Metric Value
eloc 159
dl 0
loc 459
rs 8.72
c 0
b 0
f 0
wmc 46

24 Methods

Rating   Name   Duplication   Size   Complexity  
A getNumBots() 0 3 1
A linksExtCount() 0 3 1
A redirectsCount() 0 3 1
A getTopEditorsByEditCount() 0 36 3
A getNumCategories() 0 3 1
A getBots() 0 26 4
A linksOutCount() 0 3 1
A getNumRevisions() 0 6 2
A getProseStats() 0 30 3
A getTransclusionData() 0 7 2
A getAssessments() 0 9 2
A getLinksAndRedirects() 0 6 2
A getBugs() 0 6 2
A getBasicEditingInfo() 0 3 1
A linksInCount() 0 3 1
A getMaxRevisions() 0 6 2
A tooManyRevisions() 0 3 2
A getNumFiles() 0 3 1
A __construct() 0 6 1
A countCharsAndWords() 0 13 1
A numBugs() 0 3 1
B getArticleInfoApiData() 0 60 6
A getBotRevisionCount() 0 18 4
A getNumTemplates() 0 3 1

How to fix   Complexity   

Complex Class

Complex classes like ArticleInfoApi often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

While breaking up the class, it is a good idea to analyze how other classes use ArticleInfoApi, and based on these observations, apply Extract Interface, too.

1
<?php
2
declare(strict_types = 1);
3
4
namespace App\Model;
5
6
use App\Repository\ArticleInfoRepository;
7
use DateTime;
8
use Doctrine\DBAL\Statement;
9
use Symfony\Component\DependencyInjection\ContainerInterface;
10
use Symfony\Component\DomCrawler\Crawler;
11
use Symfony\Component\HttpKernel\Exception\HttpException;
12
use Symfony\Component\HttpKernel\Exception\ServiceUnavailableHttpException;
13
14
/**
15
 * An ArticleInfoApi is standalone logic for the Article Info tool. These methods perform SQL queries
16
 * or make API requests and can be called directly, without any knowledge of the child ArticleInfo class.
17
 * It does require that the ArticleInfoRepository be set, however.
18
 * @see ArticleInfo
19
 */
20
class ArticleInfoApi extends Model
21
{
22
    /** @var ContainerInterface The application's DI container. */
23
    protected $container;
24
25
    /** @var int Number of revisions that belong to the page. */
26
    protected $numRevisions;
27
28
    /** @var int Maximum number of revisions to process, as configured. */
29
    protected $maxRevisions;
30
31
    /** @var mixed[] Prose stats, with keys 'characters', 'words', 'references', 'unique_references', 'sections'. */
32
    protected $proseStats;
33
34
    /** @var array Number of categories, templates and files on the page. */
35
    protected $transclusionData;
36
37
    /** @var mixed[] Various statistics about bots that edited the page. */
38
    protected $bots;
39
40
    /** @var int Number of edits made to the page by bots. */
41
    protected $botRevisionCount;
42
43
    /** @var int[] Number of in and outgoing links and redirects to the page. */
44
    protected $linksAndRedirects;
45
46
    /** @var string[] Assessments of the page (see Page::getAssessments). */
47
    protected $assessments;
48
49
    /** @var string[] List of Wikidata and Checkwiki errors. */
50
    protected $bugs;
51
52
    /**
53
     * ArticleInfoApi constructor.
54
     * @param Page $page The page to process.
55
     * @param ContainerInterface $container The DI container.
56
     * @param false|int $start Start date as Unix timestmap.
57
     * @param false|int $end End date as Unix timestamp.
58
     */
59
    public function __construct(Page $page, ContainerInterface $container, $start = false, $end = false)
60
    {
61
        $this->page = $page;
62
        $this->container = $container;
63
        $this->start = $start;
64
        $this->end = $end;
65
    }
66
67
    /**
68
     * Get the number of revisions belonging to the page.
69
     * @return int
70
     */
71
    public function getNumRevisions(): int
72
    {
73
        if (!isset($this->numRevisions)) {
74
            $this->numRevisions = $this->page->getNumRevisions(null, $this->start, $this->end);
75
        }
76
        return $this->numRevisions;
77
    }
78
79
    /**
80
     * Are there more revisions than we should process, based on the config?
81
     * @return bool
82
     */
83
    public function tooManyRevisions(): bool
84
    {
85
        return $this->getMaxRevisions() > 0 && $this->getNumRevisions() > $this->getMaxRevisions();
86
    }
87
88
    /**
89
     * Get the maximum number of revisions that we should process.
90
     * @return int
91
     */
92
    public function getMaxRevisions(): int
93
    {
94
        if (!isset($this->maxRevisions)) {
95
            $this->maxRevisions = (int) $this->container->getParameter('app.max_page_revisions');
96
        }
97
        return $this->maxRevisions;
98
    }
99
100
    /**
101
     * Get various basic info used in the API, including the number of revisions, unique authors, initial author
102
     * and edit count of the initial author. This is combined into one query for better performance. Caching is
103
     * intentionally disabled, because using the gadget, this will get hit for a different page constantly, where
104
     * the likelihood of cache benefiting us is slim.
105
     * @return string[]|false false if the page was not found.
106
     */
107
    public function getBasicEditingInfo()
108
    {
109
        return $this->getRepository()->getBasicEditingInfo($this->page);
0 ignored issues
show
Bug introduced by
The method getBasicEditingInfo() does not exist on App\Repository\Repository. It seems like you code against a sub-type of App\Repository\Repository such as App\Repository\ArticleInfoRepository. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

109
        return $this->getRepository()->/** @scrutinizer ignore-call */ getBasicEditingInfo($this->page);
Loading history...
110
    }
111
112
    /**
113
     * Get the top editors to the page by edit count.
114
     * @param int $limit Default 20, maximum 1,000.
115
     * @param bool $noBots Set to non-false to exclude bots from the result.
116
     * @return array
117
     */
118
    public function getTopEditorsByEditCount(int $limit = 20, bool $noBots = false): array
119
    {
120
        // Quick cache, valid only for the same request.
121
        static $topEditors = null;
122
        if (null !== $topEditors) {
123
            return $topEditors;
124
        }
125
126
        $rows = $this->getRepository()->getTopEditorsByEditCount(
0 ignored issues
show
Bug introduced by
The method getTopEditorsByEditCount() does not exist on App\Repository\Repository. It seems like you code against a sub-type of App\Repository\Repository such as App\Repository\ArticleInfoRepository. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

126
        $rows = $this->getRepository()->/** @scrutinizer ignore-call */ getTopEditorsByEditCount(
Loading history...
127
            $this->page,
128
            $this->start,
129
            $this->end,
130
            min($limit, 1000),
131
            $noBots
132
        );
133
134
        $topEditors = [];
135
        $rank = 0;
136
        foreach ($rows as $row) {
137
            $topEditors[] = [
138
                'rank' => ++$rank,
139
                'username' => $row['username'],
140
                'count' => $row['count'],
141
                'minor' => $row['minor'],
142
                'first_edit' => [
143
                    'id' => $row['first_revid'],
144
                    'timestamp' => $row['first_timestamp'],
145
                ],
146
                'latest_edit' => [
147
                    'id' => $row['latest_revid'],
148
                    'timestamp' => $row['latest_timestamp'],
149
                ],
150
            ];
151
        }
152
153
        return $topEditors;
154
    }
155
156
    /**
157
     * Get prose and reference information.
158
     * @return array With keys 'characters', 'words', 'references', 'unique_references'
159
     */
160
    public function getProseStats(): array
161
    {
162
        if (isset($this->proseStats)) {
163
            return $this->proseStats;
164
        }
165
166
        $datetime = is_int($this->end) ? new DateTime("@$this->end") : null;
167
        $html = $this->page->getHTMLContent($datetime);
168
169
        $crawler = new Crawler($html);
170
171
        [$chars, $words] = $this->countCharsAndWords($crawler, '#mw-content-text p');
172
173
        $refs = $crawler->filter('#mw-content-text .reference');
174
        $refContent = [];
175
        $refs->each(function ($ref) use (&$refContent): void {
176
            $refContent[] = $ref->text();
177
        });
178
        $uniqueRefs = count(array_unique($refContent));
179
180
        $sections = count($crawler->filter('#mw-content-text .mw-headline'));
181
182
        $this->proseStats = [
183
            'characters' => $chars,
184
            'words' => $words,
185
            'references' => $refs->count(),
186
            'unique_references' => $uniqueRefs,
187
            'sections' => $sections,
188
        ];
189
        return $this->proseStats;
190
    }
191
192
    /**
193
     * Count the number of characters and words of the plain text within the DOM element matched by the given selector.
194
     * @param Crawler $crawler
195
     * @param string $selector HTML selector.
196
     * @return array [num chars, num words]
197
     */
198
    private function countCharsAndWords(Crawler $crawler, string $selector): array
199
    {
200
        $totalChars = 0;
201
        $totalWords = 0;
202
        $paragraphs = $crawler->filter($selector);
203
        $paragraphs->each(function ($node) use (&$totalChars, &$totalWords): void {
204
            /** @var Crawler $node */
205
            $text = preg_replace('/\[\d+]/', '', trim($node->text(null, true)));
206
            $totalChars += strlen($text);
207
            $totalWords += count(explode(' ', $text));
208
        });
209
210
        return [$totalChars, $totalWords];
211
    }
212
213
    /**
214
     * Get the page assessments of the page.
215
     * @see https://www.mediawiki.org/wiki/Extension:PageAssessments
216
     * @return string[]|false False if unsupported.
217
     * @codeCoverageIgnore
218
     */
219
    public function getAssessments()
220
    {
221
        if (!is_array($this->assessments)) {
0 ignored issues
show
introduced by
The condition is_array($this->assessments) is always true.
Loading history...
222
            $this->assessments = $this->page
223
                ->getProject()
224
                ->getPageAssessments()
225
                ->getAssessments($this->page);
226
        }
227
        return $this->assessments;
228
    }
229
230
    /**
231
     * Get the list of page's wikidata and Checkwiki errors.
232
     * @see Page::getErrors()
233
     * @return string[]
234
     */
235
    public function getBugs(): array
236
    {
237
        if (!is_array($this->bugs)) {
0 ignored issues
show
introduced by
The condition is_array($this->bugs) is always true.
Loading history...
238
            $this->bugs = $this->page->getErrors();
239
        }
240
        return $this->bugs;
241
    }
242
243
    /**
244
     * Get the number of wikidata nad CheckWiki errors.
245
     * @return int
246
     */
247
    public function numBugs(): int
248
    {
249
        return count($this->getBugs());
250
    }
251
252
    /**
253
     * Generate the data structure that will used in the ArticleInfo API response.
254
     * @param Project $project
255
     * @param Page $page
256
     * @return array
257
     * @codeCoverageIgnore
258
     */
259
    public function getArticleInfoApiData(Project $project, Page $page): array
260
    {
261
        /** @var int $pageviewsOffset Number of days to query for pageviews */
262
        $pageviewsOffset = 30;
263
264
        $data = [
265
            'project' => $project->getDomain(),
266
            'page' => $page->getTitle(),
267
            'watchers' => (int) $page->getWatchers(),
268
            'pageviews' => $page->getLastPageviews($pageviewsOffset),
269
            'pageviews_offset' => $pageviewsOffset,
270
        ];
271
272
        $info = false;
273
274
        try {
275
            $articleInfoRepo = new ArticleInfoRepository();
276
            $articleInfoRepo->setContainer($this->container);
277
            $info = $articleInfoRepo->getBasicEditingInfo($page);
278
        } catch (ServiceUnavailableHttpException $e) {
279
            // No more open database connections.
280
            $data['error'] = 'Unable to fetch revision data. Please try again later.';
281
        } catch (HttpException $e) {
282
            /**
283
             * The query most likely exceeded the maximum query time,
284
             * so we'll abort and give only info retrieved by the API.
285
             */
286
            $data['error'] = 'Unable to fetch revision data. The query may have timed out.';
287
        }
288
289
        if (false !== $info) {
290
            $creationDateTime = DateTime::createFromFormat('YmdHis', $info['created_at']);
291
            $modifiedDateTime = DateTime::createFromFormat('YmdHis', $info['modified_at']);
292
            $secsSinceLastEdit = (new DateTime)->getTimestamp() - $modifiedDateTime->getTimestamp();
293
294
            // Some wikis (such foundation.wikimedia.org) may be missing the creation date.
295
            $creationDateTime = false === $creationDateTime
296
                ? null
297
                : $creationDateTime->format('Y-m-d');
298
299
            $assessment = $page->getProject()
300
                ->getPageAssessments()
301
                ->getAssessment($page);
302
303
            $data = array_merge($data, [
304
                'revisions' => (int) $info['num_edits'],
305
                'editors' => (int) $info['num_editors'],
306
                'minor_edits' => (int) $info['minor_edits'],
307
                'author' => $info['author'],
308
                'author_editcount' => null === $info['author_editcount'] ? null : (int) $info['author_editcount'],
0 ignored issues
show
introduced by
The condition null === $info['author_editcount'] is always false.
Loading history...
309
                'created_at' => $creationDateTime,
310
                'created_rev_id' => $info['created_rev_id'],
311
                'modified_at' => $modifiedDateTime->format('Y-m-d H:i'),
312
                'secs_since_last_edit' => $secsSinceLastEdit,
313
                'last_edit_id' => (int) $info['modified_rev_id'],
314
                'assessment' => $assessment,
315
            ]);
316
        }
317
318
        return $data;
319
    }
320
321
    /************************ Link statistics ************************/
322
323
    /**
324
     * Get the number of external links on the page.
325
     * @return int
326
     */
327
    public function linksExtCount(): int
328
    {
329
        return $this->getLinksAndRedirects()['links_ext_count'];
330
    }
331
332
    /**
333
     * Get the number of incoming links to the page.
334
     * @return int
335
     */
336
    public function linksInCount(): int
337
    {
338
        return $this->getLinksAndRedirects()['links_in_count'];
339
    }
340
341
    /**
342
     * Get the number of outgoing links from the page.
343
     * @return int
344
     */
345
    public function linksOutCount(): int
346
    {
347
        return $this->getLinksAndRedirects()['links_out_count'];
348
    }
349
350
    /**
351
     * Get the number of redirects to the page.
352
     * @return int
353
     */
354
    public function redirectsCount(): int
355
    {
356
        return $this->getLinksAndRedirects()['redirects_count'];
357
    }
358
359
    /**
360
     * Get the number of external, incoming and outgoing links, along with the number of redirects to the page.
361
     * @return int[]
362
     * @codeCoverageIgnore
363
     */
364
    private function getLinksAndRedirects(): array
365
    {
366
        if (!is_array($this->linksAndRedirects)) {
0 ignored issues
show
introduced by
The condition is_array($this->linksAndRedirects) is always true.
Loading history...
367
            $this->linksAndRedirects = $this->page->countLinksAndRedirects();
368
        }
369
        return $this->linksAndRedirects;
370
    }
371
372
    /**
373
     * Fetch transclusion data (categories, templates and files) that are on the page.
374
     * @return array With keys 'categories', 'templates' and 'files'.
375
     */
376
    public function getTransclusionData(): array
377
    {
378
        if (!is_array($this->transclusionData)) {
0 ignored issues
show
introduced by
The condition is_array($this->transclusionData) is always true.
Loading history...
379
            $this->transclusionData = $this->getRepository()
380
                ->getTransclusionData($this->page);
381
        }
382
        return $this->transclusionData;
383
    }
384
385
    /**
386
     * Get the number of categories that are on the page.
387
     * @return int
388
     */
389
    public function getNumCategories(): int
390
    {
391
        return $this->getTransclusionData()['categories'];
392
    }
393
394
    /**
395
     * Get the number of templates that are on the page.
396
     * @return int
397
     */
398
    public function getNumTemplates(): int
399
    {
400
        return $this->getTransclusionData()['templates'];
401
    }
402
403
    /**
404
     * Get the number of files that are on the page.
405
     * @return int
406
     */
407
    public function getNumFiles(): int
408
    {
409
        return $this->getTransclusionData()['files'];
410
    }
411
412
    /************************ Bot statistics ************************/
413
414
    /**
415
     * Number of edits made to the page by current or former bots.
416
     * @param string[] $bots Used only in unit tests, where we supply mock data for the bots that will get processed.
417
     * @return int
418
     */
419
    public function getBotRevisionCount(?array $bots = null): int
420
    {
421
        if (isset($this->botRevisionCount)) {
422
            return $this->botRevisionCount;
423
        }
424
425
        if (null === $bots) {
426
            $bots = $this->getBots();
427
        }
428
429
        $count = 0;
430
431
        foreach (array_values($bots) as $data) {
432
            $count += $data['count'];
433
        }
434
435
        $this->botRevisionCount = $count;
436
        return $count;
437
    }
438
439
    /**
440
     * Get and set $this->bots about bots that edited the page. This is done separately from the main query because
441
     * we use this information when computing the top 10 editors in ArticleInfo, where we don't want to include bots.
442
     * @return mixed[]
443
     */
444
    public function getBots(): array
445
    {
446
        if (isset($this->bots)) {
447
            return $this->bots;
448
        }
449
450
        // Parse the bot edits.
451
        $this->bots = [];
452
453
        $limit = $this->tooManyRevisions() ? $this->getMaxRevisions() : null;
454
455
        /** @var Statement $botData */
456
        $botData = $this->getRepository()->getBotData($this->page, $this->start, $this->end, $limit);
0 ignored issues
show
Bug introduced by
The method getBotData() does not exist on App\Repository\Repository. It seems like you code against a sub-type of App\Repository\Repository such as App\Repository\ArticleInfoRepository. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

456
        $botData = $this->getRepository()->/** @scrutinizer ignore-call */ getBotData($this->page, $this->start, $this->end, $limit);
Loading history...
457
        while ($bot = $botData->fetch()) {
0 ignored issues
show
Deprecated Code introduced by
The function Doctrine\DBAL\Statement::fetch() has been deprecated: Use fetchNumeric(), fetchAssociative() or fetchOne() instead. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-deprecated  annotation

457
        while ($bot = /** @scrutinizer ignore-deprecated */ $botData->fetch()) {

This function has been deprecated. The supplier of the function has supplied an explanatory message.

The explanatory message should give you some clue as to whether and when the function will be removed and what other function to use instead.

Loading history...
458
            $this->bots[$bot['username']] = [
459
                'count' => (int)$bot['count'],
460
                'current' => '1' === $bot['current'],
461
            ];
462
        }
463
464
        // Sort by edit count.
465
        uasort($this->bots, function ($a, $b) {
466
            return $b['count'] - $a['count'];
467
        });
468
469
        return $this->bots;
470
    }
471
472
    /**
473
     * Get the number of bots that edited the page.
474
     * @return int
475
     */
476
    public function getNumBots(): int
477
    {
478
        return count($this->getBots());
479
    }
480
}
481