Test Setup Failed
Pull Request — main (#426)
by MusikAnimal
17:10 queued 11:44
created

ArticleInfoApi   B

Complexity

Total Complexity 46

Size/Duplication

Total Lines 459
Duplicated Lines 0 %

Importance

Changes 0
Metric Value
eloc 159
dl 0
loc 459
rs 8.72
c 0
b 0
f 0
wmc 46

24 Methods

Rating   Name   Duplication   Size   Complexity  
A getNumBots() 0 3 1
A linksExtCount() 0 3 1
A redirectsCount() 0 3 1
A getTopEditorsByEditCount() 0 36 3
A getNumCategories() 0 3 1
A getBots() 0 26 4
A linksOutCount() 0 3 1
A getNumRevisions() 0 6 2
A getProseStats() 0 30 3
A getTransclusionData() 0 7 2
A getAssessments() 0 9 2
A getLinksAndRedirects() 0 6 2
A getBugs() 0 6 2
A getBasicEditingInfo() 0 3 1
A linksInCount() 0 3 1
A getMaxRevisions() 0 6 2
A tooManyRevisions() 0 3 2
A getNumFiles() 0 3 1
A __construct() 0 6 1
A countCharsAndWords() 0 13 1
A numBugs() 0 3 1
B getArticleInfoApiData() 0 60 6
A getBotRevisionCount() 0 18 4
A getNumTemplates() 0 3 1

How to fix   Complexity   

Complex Class

Complex classes like ArticleInfoApi often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

While breaking up the class, it is a good idea to analyze how other classes use ArticleInfoApi, and based on these observations, apply Extract Interface, too.

1
<?php
2
declare(strict_types = 1);
3
4
namespace App\Model;
5
6
use App\Repository\ArticleInfoRepository;
7
use DateTime;
8
use Doctrine\DBAL\Driver\ResultStatement;
9
use Symfony\Component\DependencyInjection\ContainerInterface;
10
use Symfony\Component\DomCrawler\Crawler;
11
use Symfony\Component\HttpKernel\Exception\HttpException;
12
use Symfony\Component\HttpKernel\Exception\ServiceUnavailableHttpException;
13
14
/**
15
 * An ArticleInfoApi is standalone logic for the Article Info tool. These methods perform SQL queries
16
 * or make API requests and can be called directly, without any knowledge of the child ArticleInfo class.
17
 * It does require that the ArticleInfoRepository be set, however.
18
 * @see ArticleInfo
19
 */
20
class ArticleInfoApi extends Model
21
{
22
    /** @var ContainerInterface The application's DI container. */
23
    protected $container;
24
25
    /** @var int Number of revisions that belong to the page. */
26
    protected $numRevisions;
27
28
    /** @var int Maximum number of revisions to process, as configured. */
29
    protected $maxRevisions;
30
31
    /** @var mixed[] Prose stats, with keys 'characters', 'words', 'references', 'unique_references', 'sections'. */
32
    protected $proseStats;
33
34
    /** @var array Number of categories, templates and files on the page. */
35
    protected $transclusionData;
36
37
    /** @var mixed[] Various statistics about bots that edited the page. */
38
    protected $bots;
39
40
    /** @var int Number of edits made to the page by bots. */
41
    protected $botRevisionCount;
42
43
    /** @var int[] Number of in and outgoing links and redirects to the page. */
44
    protected $linksAndRedirects;
45
46
    /** @var string[] Assessments of the page (see Page::getAssessments). */
47
    protected $assessments;
48
49
    /** @var string[] List of Wikidata and Checkwiki errors. */
50
    protected $bugs;
51
52
    /**
53
     * ArticleInfoApi constructor.
54
     * @param Page $page The page to process.
55
     * @param ContainerInterface $container The DI container.
56
     * @param false|int $start Start date as Unix timestmap.
57
     * @param false|int $end End date as Unix timestamp.
58
     */
59
    public function __construct(Page $page, ContainerInterface $container, $start = false, $end = false)
60
    {
61
        $this->page = $page;
62
        $this->container = $container;
63
        $this->start = $start;
64
        $this->end = $end;
65
    }
66
67
    /**
68
     * Get the number of revisions belonging to the page.
69
     * @return int
70
     */
71
    public function getNumRevisions(): int
72
    {
73
        if (!isset($this->numRevisions)) {
74
            $this->numRevisions = $this->page->getNumRevisions(null, $this->start, $this->end);
75
        }
76
        return $this->numRevisions;
77
    }
78
79
    /**
80
     * Are there more revisions than we should process, based on the config?
81
     * @return bool
82
     */
83
    public function tooManyRevisions(): bool
84
    {
85
        return $this->getMaxRevisions() > 0 && $this->getNumRevisions() > $this->getMaxRevisions();
86
    }
87
88
    /**
89
     * Get the maximum number of revisions that we should process.
90
     * @return int
91
     */
92
    public function getMaxRevisions(): int
93
    {
94
        if (!isset($this->maxRevisions)) {
95
            $this->maxRevisions = (int) $this->container->getParameter('app.max_page_revisions');
96
        }
97
        return $this->maxRevisions;
98
    }
99
100
    /**
101
     * Get various basic info used in the API, including the number of revisions, unique authors, initial author
102
     * and edit count of the initial author. This is combined into one query for better performance. Caching is
103
     * intentionally disabled, because using the gadget, this will get hit for a different page constantly, where
104
     * the likelihood of cache benefiting us is slim.
105
     * @return string[]|false false if the page was not found.
106
     */
107
    public function getBasicEditingInfo()
108
    {
109
        return $this->getRepository()->getBasicEditingInfo($this->page);
0 ignored issues
show
Bug introduced by
The method getBasicEditingInfo() does not exist on App\Repository\Repository. It seems like you code against a sub-type of App\Repository\Repository such as App\Repository\ArticleInfoRepository. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

109
        return $this->getRepository()->/** @scrutinizer ignore-call */ getBasicEditingInfo($this->page);
Loading history...
110
    }
111
112
    /**
113
     * Get the top editors to the page by edit count.
114
     * @param int $limit Default 20, maximum 1,000.
115
     * @param bool $noBots Set to non-false to exclude bots from the result.
116
     * @return array
117
     */
118
    public function getTopEditorsByEditCount(int $limit = 20, bool $noBots = false): array
119
    {
120
        // Quick cache, valid only for the same request.
121
        static $topEditors = null;
122
        if (null !== $topEditors) {
123
            return $topEditors;
124
        }
125
126
        $rows = $this->getRepository()->getTopEditorsByEditCount(
0 ignored issues
show
Bug introduced by
The method getTopEditorsByEditCount() does not exist on App\Repository\Repository. It seems like you code against a sub-type of App\Repository\Repository such as App\Repository\ArticleInfoRepository. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

126
        $rows = $this->getRepository()->/** @scrutinizer ignore-call */ getTopEditorsByEditCount(
Loading history...
127
            $this->page,
128
            $this->start,
129
            $this->end,
130
            min($limit, 1000),
131
            $noBots
132
        );
133
134
        $topEditors = [];
135
        $rank = 0;
136
        foreach ($rows as $row) {
137
            $topEditors[] = [
138
                'rank' => ++$rank,
139
                'username' => $row['username'],
140
                'count' => $row['count'],
141
                'minor' => $row['minor'],
142
                'first_edit' => [
143
                    'id' => $row['first_revid'],
144
                    'timestamp' => $row['first_timestamp'],
145
                ],
146
                'latest_edit' => [
147
                    'id' => $row['latest_revid'],
148
                    'timestamp' => $row['latest_timestamp'],
149
                ],
150
            ];
151
        }
152
153
        return $topEditors;
154
    }
155
156
    /**
157
     * Get prose and reference information.
158
     * @return array With keys 'characters', 'words', 'references', 'unique_references'
159
     */
160
    public function getProseStats(): array
161
    {
162
        if (isset($this->proseStats)) {
163
            return $this->proseStats;
164
        }
165
166
        $datetime = is_int($this->end) ? new DateTime("@$this->end") : null;
167
        $html = $this->page->getHTMLContent($datetime);
168
169
        $crawler = new Crawler($html);
170
171
        [$chars, $words] = $this->countCharsAndWords($crawler, '#mw-content-text p');
172
173
        $refs = $crawler->filter('#mw-content-text .reference');
174
        $refContent = [];
175
        $refs->each(function ($ref) use (&$refContent): void {
176
            $refContent[] = $ref->text();
177
        });
178
        $uniqueRefs = count(array_unique($refContent));
179
180
        $sections = count($crawler->filter('#mw-content-text .mw-headline'));
181
182
        $this->proseStats = [
183
            'characters' => $chars,
184
            'words' => $words,
185
            'references' => $refs->count(),
186
            'unique_references' => $uniqueRefs,
187
            'sections' => $sections,
188
        ];
189
        return $this->proseStats;
190
    }
191
192
    /**
193
     * Count the number of characters and words of the plain text within the DOM element matched by the given selector.
194
     * @param Crawler $crawler
195
     * @param string $selector HTML selector.
196
     * @return array [num chars, num words]
197
     */
198
    private function countCharsAndWords(Crawler $crawler, string $selector): array
199
    {
200
        $totalChars = 0;
201
        $totalWords = 0;
202
        $paragraphs = $crawler->filter($selector);
203
        $paragraphs->each(function ($node) use (&$totalChars, &$totalWords): void {
204
            /** @var Crawler $node */
205
            $text = preg_replace('/\[\d+]/', '', trim($node->text(null, true)));
206
            $totalChars += strlen($text);
207
            $totalWords += count(explode(' ', $text));
208
        });
209
210
        return [$totalChars, $totalWords];
211
    }
212
213
    /**
214
     * Get the page assessments of the page.
215
     * @see https://www.mediawiki.org/wiki/Extension:PageAssessments
216
     * @return string[]|false False if unsupported.
217
     * @codeCoverageIgnore
218
     */
219
    public function getAssessments()
220
    {
221
        if (!is_array($this->assessments)) {
0 ignored issues
show
introduced by
The condition is_array($this->assessments) is always true.
Loading history...
222
            $this->assessments = $this->page
223
                ->getProject()
224
                ->getPageAssessments()
225
                ->getAssessments($this->page);
226
        }
227
        return $this->assessments;
228
    }
229
230
    /**
231
     * Get the list of page's wikidata and Checkwiki errors.
232
     * @see Page::getErrors()
233
     * @return string[]
234
     */
235
    public function getBugs(): array
236
    {
237
        if (!is_array($this->bugs)) {
0 ignored issues
show
introduced by
The condition is_array($this->bugs) is always true.
Loading history...
238
            $this->bugs = $this->page->getErrors();
239
        }
240
        return $this->bugs;
241
    }
242
243
    /**
244
     * Get the number of wikidata nad CheckWiki errors.
245
     * @return int
246
     */
247
    public function numBugs(): int
248
    {
249
        return count($this->getBugs());
250
    }
251
252
    /**
253
     * Generate the data structure that will used in the ArticleInfo API response.
254
     * @param Project $project
255
     * @param Page $page
256
     * @return array
257
     * @codeCoverageIgnore
258
     */
259
    public function getArticleInfoApiData(Project $project, Page $page): array
260
    {
261
        /** @var int $pageviewsOffset Number of days to query for pageviews */
262
        $pageviewsOffset = 30;
263
264
        $data = [
265
            'project' => $project->getDomain(),
266
            'page' => $page->getTitle(),
267
            'watchers' => (int) $page->getWatchers(),
268
            'pageviews' => $page->getLastPageviews($pageviewsOffset),
269
            'pageviews_offset' => $pageviewsOffset,
270
        ];
271
272
        $info = false;
273
274
        try {
275
            $articleInfoRepo = new ArticleInfoRepository();
276
            $articleInfoRepo->setContainer($this->container);
277
            $info = $articleInfoRepo->getBasicEditingInfo($page);
278
        } catch (ServiceUnavailableHttpException $e) {
279
            // No more open database connections.
280
            $data['error'] = 'Unable to fetch revision data. Please try again later.';
281
        } catch (HttpException $e) {
282
            /**
283
             * The query most likely exceeded the maximum query time,
284
             * so we'll abort and give only info retrieved by the API.
285
             */
286
            $data['error'] = 'Unable to fetch revision data. The query may have timed out.';
287
        }
288
289
        if (false !== $info) {
290
            $creationDateTime = DateTime::createFromFormat('YmdHis', $info['created_at']);
291
            $modifiedDateTime = DateTime::createFromFormat('YmdHis', $info['modified_at']);
292
            $secsSinceLastEdit = (new DateTime)->getTimestamp() - $modifiedDateTime->getTimestamp();
293
294
            // Some wikis (such foundation.wikimedia.org) may be missing the creation date.
295
            $creationDateTime = false === $creationDateTime
296
                ? null
297
                : $creationDateTime->format('Y-m-d');
298
299
            $assessment = $page->getProject()
300
                ->getPageAssessments()
301
                ->getAssessment($page);
302
303
            $data = array_merge($data, [
304
                'revisions' => (int) $info['num_edits'],
305
                'editors' => (int) $info['num_editors'],
306
                'minor_edits' => (int) $info['minor_edits'],
307
                'author' => $info['author'],
308
                'author_editcount' => null === $info['author_editcount'] ? null : (int) $info['author_editcount'],
0 ignored issues
show
introduced by
The condition null === $info['author_editcount'] is always false.
Loading history...
309
                'created_at' => $creationDateTime,
310
                'created_rev_id' => $info['created_rev_id'],
311
                'modified_at' => $modifiedDateTime->format('Y-m-d H:i'),
312
                'secs_since_last_edit' => $secsSinceLastEdit,
313
                'last_edit_id' => (int) $info['modified_rev_id'],
314
                'assessment' => $assessment,
315
            ]);
316
        }
317
318
        return $data;
319
    }
320
321
    /************************ Link statistics ************************/
322
323
    /**
324
     * Get the number of external links on the page.
325
     * @return int
326
     */
327
    public function linksExtCount(): int
328
    {
329
        return $this->getLinksAndRedirects()['links_ext_count'];
330
    }
331
332
    /**
333
     * Get the number of incoming links to the page.
334
     * @return int
335
     */
336
    public function linksInCount(): int
337
    {
338
        return $this->getLinksAndRedirects()['links_in_count'];
339
    }
340
341
    /**
342
     * Get the number of outgoing links from the page.
343
     * @return int
344
     */
345
    public function linksOutCount(): int
346
    {
347
        return $this->getLinksAndRedirects()['links_out_count'];
348
    }
349
350
    /**
351
     * Get the number of redirects to the page.
352
     * @return int
353
     */
354
    public function redirectsCount(): int
355
    {
356
        return $this->getLinksAndRedirects()['redirects_count'];
357
    }
358
359
    /**
360
     * Get the number of external, incoming and outgoing links, along with the number of redirects to the page.
361
     * @return int[]
362
     * @codeCoverageIgnore
363
     */
364
    private function getLinksAndRedirects(): array
365
    {
366
        if (!is_array($this->linksAndRedirects)) {
0 ignored issues
show
introduced by
The condition is_array($this->linksAndRedirects) is always true.
Loading history...
367
            $this->linksAndRedirects = $this->page->countLinksAndRedirects();
368
        }
369
        return $this->linksAndRedirects;
370
    }
371
372
    /**
373
     * Fetch transclusion data (categories, templates and files) that are on the page.
374
     * @return array With keys 'categories', 'templates' and 'files'.
375
     */
376
    public function getTransclusionData(): array
377
    {
378
        if (!is_array($this->transclusionData)) {
0 ignored issues
show
introduced by
The condition is_array($this->transclusionData) is always true.
Loading history...
379
            $this->transclusionData = $this->getRepository()
380
                ->getTransclusionData($this->page);
381
        }
382
        return $this->transclusionData;
383
    }
384
385
    /**
386
     * Get the number of categories that are on the page.
387
     * @return int
388
     */
389
    public function getNumCategories(): int
390
    {
391
        return $this->getTransclusionData()['categories'];
392
    }
393
394
    /**
395
     * Get the number of templates that are on the page.
396
     * @return int
397
     */
398
    public function getNumTemplates(): int
399
    {
400
        return $this->getTransclusionData()['templates'];
401
    }
402
403
    /**
404
     * Get the number of files that are on the page.
405
     * @return int
406
     */
407
    public function getNumFiles(): int
408
    {
409
        return $this->getTransclusionData()['files'];
410
    }
411
412
    /************************ Bot statistics ************************/
413
414
    /**
415
     * Number of edits made to the page by current or former bots.
416
     * @param string[] $bots Used only in unit tests, where we supply mock data for the bots that will get processed.
417
     * @return int
418
     */
419
    public function getBotRevisionCount(?array $bots = null): int
420
    {
421
        if (isset($this->botRevisionCount)) {
422
            return $this->botRevisionCount;
423
        }
424
425
        if (null === $bots) {
426
            $bots = $this->getBots();
427
        }
428
429
        $count = 0;
430
431
        foreach (array_values($bots) as $data) {
432
            $count += $data['count'];
433
        }
434
435
        $this->botRevisionCount = $count;
436
        return $count;
437
    }
438
439
    /**
440
     * Get and set $this->bots about bots that edited the page. This is done separately from the main query because
441
     * we use this information when computing the top 10 editors in ArticleInfo, where we don't want to include bots.
442
     * @return mixed[]
443
     */
444
    public function getBots(): array
445
    {
446
        if (isset($this->bots)) {
447
            return $this->bots;
448
        }
449
450
        // Parse the bot edits.
451
        $this->bots = [];
452
453
        $limit = $this->tooManyRevisions() ? $this->getMaxRevisions() : null;
454
455
        /** @var ResultStatement $botData */
456
        $botData = $this->getRepository()->getBotData($this->page, $this->start, $this->end, $limit);
0 ignored issues
show
Bug introduced by
The method getBotData() does not exist on App\Repository\Repository. It seems like you code against a sub-type of App\Repository\Repository such as App\Repository\ArticleInfoRepository. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

456
        $botData = $this->getRepository()->/** @scrutinizer ignore-call */ getBotData($this->page, $this->start, $this->end, $limit);
Loading history...
457
        while ($bot = $botData->fetchAssociative()) {
0 ignored issues
show
Bug introduced by
The method fetchAssociative() does not exist on Doctrine\DBAL\Driver\ResultStatement. It seems like you code against a sub-type of said class. However, the method does not exist in Doctrine\DBAL\Driver\Statement. Are you sure you never get one of those? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

457
        while ($bot = $botData->/** @scrutinizer ignore-call */ fetchAssociative()) {
Loading history...
458
            $this->bots[$bot['username']] = [
459
                'count' => (int)$bot['count'],
460
                'current' => '1' === $bot['current'],
461
            ];
462
        }
463
464
        // Sort by edit count.
465
        uasort($this->bots, function ($a, $b) {
466
            return $b['count'] - $a['count'];
467
        });
468
469
        return $this->bots;
470
    }
471
472
    /**
473
     * Get the number of bots that edited the page.
474
     * @return int
475
     */
476
    public function getNumBots(): int
477
    {
478
        return count($this->getBots());
479
    }
480
}
481