Passed
Push — master ( 0e30e3...7b249e )
by MusikAnimal
07:20
created

ArticleInfoApi::getProseStats()   A

Complexity

Conditions 3
Paths 3

Size

Total Lines 30
Code Lines 19

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 20
CRAP Score 3.0009

Importance

Changes 1
Bugs 0 Features 0
Metric Value
cc 3
eloc 19
nc 3
nop 0
dl 0
loc 30
rs 9.6333
c 1
b 0
f 0
ccs 20
cts 21
cp 0.9524
crap 3.0009
1
<?php
2
declare(strict_types = 1);
3
4
namespace AppBundle\Model;
5
6
use AppBundle\Repository\ArticleInfoRepository;
7
use DateTime;
8
use Doctrine\DBAL\Statement;
9
use Symfony\Component\DependencyInjection\ContainerInterface;
10
use Symfony\Component\DomCrawler\Crawler;
11
use Symfony\Component\HttpKernel\Exception\HttpException;
12
use Symfony\Component\HttpKernel\Exception\ServiceUnavailableHttpException;
13
14
/**
15
 * An ArticleInfoApi is standalone logic for the Article Info tool. These methods perform SQL queries
16
 * or make API requests and can be called directly, without any knowledge of the child ArticleInfo class.
17
 * It does require that the ArticleInfoRepository be set, however.
18
 * @see ArticleInfo
19
 */
20
class ArticleInfoApi extends Model
21
{
22
    /** @var ContainerInterface The application's DI container. */
23
    protected $container;
24
25
    /** @var int Number of revisions that belong to the page. */
26
    protected $numRevisions;
27
28
    /** @var mixed[] Prose stats, with keys 'characters', 'words', 'references', 'unique_references', 'sections'. */
29
    protected $proseStats;
30
31
    /** @var array Number of categories, templates and files on the page. */
32
    protected $transclusionData;
33
34
    /** @var mixed[] Various statistics about bots that edited the page. */
35
    protected $bots;
36
37
    /** @var int Number of edits made to the page by bots. */
38
    protected $botRevisionCount;
39
40
    /** @var int[] Number of in and outgoing links and redirects to the page. */
41
    protected $linksAndRedirects;
42
43
    /** @var string[] Assessments of the page (see Page::getAssessments). */
44
    protected $assessments;
45
46
    /** @var string[] List of Wikidata and Checkwiki errors. */
47
    protected $bugs;
48
49
    /**
50
     * ArticleInfoApi constructor.
51
     * @param Page $page The page to process.
52
     * @param ContainerInterface $container The DI container.
53
     * @param false|int $start From what date to obtain records.
54
     * @param false|int $end To what date to obtain records.
55
     */
56 12
    public function __construct(Page $page, ContainerInterface $container, $start = false, $end = false)
57
    {
58 12
        $this->page = $page;
59 12
        $this->container = $container;
60 12
        $this->start = $start;
61 12
        $this->end = $end;
62 12
    }
63
64
    /**
65
     * Get date opening date range, formatted as this is used in the views.
66
     * @return string Blank if no value exists.
67
     */
68 1
    public function getStartDate(): string
69
    {
70 1
        return '' == $this->start ? '' : date('Y-m-d', $this->start);
0 ignored issues
show
Bug introduced by
It seems like $this->start can also be of type boolean and string; however, parameter $timestamp of date() does only seem to accept integer, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

70
        return '' == $this->start ? '' : date('Y-m-d', /** @scrutinizer ignore-type */ $this->start);
Loading history...
71
    }
72
73
    /**
74
     * Get date closing date range, formatted as this is used in the views.
75
     * @return string Blank if no value exists.
76
     */
77 1
    public function getEndDate(): string
78
    {
79 1
        return '' == $this->end ? '' : date('Y-m-d', $this->end);
0 ignored issues
show
Bug introduced by
It seems like $this->end can also be of type boolean and string; however, parameter $timestamp of date() does only seem to accept integer, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

79
        return '' == $this->end ? '' : date('Y-m-d', /** @scrutinizer ignore-type */ $this->end);
Loading history...
80
    }
81
82
    /**
83
     * Get the number of revisions belonging to the page.
84
     * @return int
85
     */
86 4
    public function getNumRevisions(): int
87
    {
88 4
        if (!isset($this->numRevisions)) {
89 4
            $this->numRevisions = $this->page->getNumRevisions(null, $this->start, $this->end);
0 ignored issues
show
Bug introduced by
It seems like $this->start can also be of type string; however, parameter $start of AppBundle\Model\Page::getNumRevisions() does only seem to accept false|integer, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

89
            $this->numRevisions = $this->page->getNumRevisions(null, /** @scrutinizer ignore-type */ $this->start, $this->end);
Loading history...
Bug introduced by
It seems like $this->end can also be of type string; however, parameter $end of AppBundle\Model\Page::getNumRevisions() does only seem to accept false|integer, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

89
            $this->numRevisions = $this->page->getNumRevisions(null, $this->start, /** @scrutinizer ignore-type */ $this->end);
Loading history...
90
        }
91 4
        return $this->numRevisions;
92
    }
93
94
    /**
95
     * Get various basic info used in the API, including the number of revisions, unique authors, initial author
96
     * and edit count of the initial author. This is combined into one query for better performance. Caching is
97
     * intentionally disabled, because using the gadget, this will get hit for a different page constantly, where
98
     * the likelihood of cache benefiting us is slim.
99
     * @return string[]|false false if the page was not found.
100
     */
101
    public function getBasicEditingInfo()
102
    {
103
        return $this->getRepository()->getBasicEditingInfo($this->page);
0 ignored issues
show
Bug introduced by
The method getBasicEditingInfo() does not exist on AppBundle\Repository\Repository. It seems like you code against a sub-type of AppBundle\Repository\Repository such as AppBundle\Repository\ArticleInfoRepository. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

103
        return $this->getRepository()->/** @scrutinizer ignore-call */ getBasicEditingInfo($this->page);
Loading history...
104
    }
105
106
    /**
107
     * Get the top editors to the page by edit count.
108
     * @param int $limit Default 20, maximum 1,000.
109
     * @param bool $noBots Set to non-false to exclude bots from the result.
110
     * @return array
111
     */
112
    public function getTopEditorsByEditCount(int $limit = 20, bool $noBots = false): array
113
    {
114
        // Quick cache, valid only for the same request.
115
        static $topEditors = null;
116
        if (null !== $topEditors) {
117
            return $topEditors;
118
        }
119
120
        $rows = $this->getRepository()->getTopEditorsByEditCount(
0 ignored issues
show
Bug introduced by
The method getTopEditorsByEditCount() does not exist on AppBundle\Repository\Repository. It seems like you code against a sub-type of AppBundle\Repository\Repository such as AppBundle\Repository\ArticleInfoRepository. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

120
        $rows = $this->getRepository()->/** @scrutinizer ignore-call */ getTopEditorsByEditCount(
Loading history...
121
            $this->page,
122
            $this->start,
123
            $this->end,
124
            min($limit, 1000),
125
            $noBots
126
        );
127
128
        $topEditors = [];
129
        $rank = 0;
130
        foreach ($rows as $row) {
131
            $topEditors[] = [
132
                'rank' => ++$rank,
133
                'username' => $row['username'],
134
                'count' => $row['count'],
135
                'minor' => $row['minor'],
136
                'first_edit' => [
137
                    'id' => $row['first_revid'],
138
                    'timestamp' => $row['first_timestamp'],
139
                ],
140
                'latest_edit' => [
141
                    'id' => $row['latest_revid'],
142
                    'timestamp' => $row['latest_timestamp'],
143
                ],
144
            ];
145
        }
146
147
        return $topEditors;
148
    }
149
150
    /**
151
     * Get prose and reference information.
152
     * @return array With keys 'characters', 'words', 'references', 'unique_references'
153
     */
154 1
    public function getProseStats(): array
155
    {
156 1
        if (isset($this->proseStats)) {
157
            return $this->proseStats;
158
        }
159
160 1
        $datetime = false !== $this->end ? new DateTime('@'.$this->end) : null;
0 ignored issues
show
Bug introduced by
Are you sure $this->end of type integer|string|true can be used in concatenation? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

160
        $datetime = false !== $this->end ? new DateTime('@'./** @scrutinizer ignore-type */ $this->end) : null;
Loading history...
161 1
        $html = $this->page->getHTMLContent($datetime);
162
163 1
        $crawler = new Crawler($html);
164
165 1
        [$chars, $words] = $this->countCharsAndWords($crawler, '#mw-content-text p');
166
167 1
        $refs = $crawler->filter('#mw-content-text .reference');
168 1
        $refContent = [];
169 1
        $refs->each(function ($ref) use (&$refContent): void {
170 1
            $refContent[] = $ref->text();
171 1
        });
172 1
        $uniqueRefs = count(array_unique($refContent));
173
174 1
        $sections = count($crawler->filter('#mw-content-text .mw-headline'));
175
176 1
        $this->proseStats = [
177 1
            'characters' => $chars,
178 1
            'words' => $words,
179 1
            'references' => $refs->count(),
180 1
            'unique_references' => $uniqueRefs,
181 1
            'sections' => $sections,
182
        ];
183 1
        return $this->proseStats;
184
    }
185
186
    /**
187
     * Count the number of characters and words of the plain text within the DOM element matched by the given selector.
188
     * @param Crawler $crawler
189
     * @param string $selector HTML selector.
190
     * @return array [num chars, num words]
191
     */
192 1
    private function countCharsAndWords(Crawler $crawler, string $selector): array
193
    {
194 1
        $totalChars = 0;
195 1
        $totalWords = 0;
196 1
        $paragraphs = $crawler->filter($selector);
197 1
        $paragraphs->each(function ($node) use (&$totalChars, &$totalWords): void {
198 1
            $text = preg_replace('/\[\d+]/', '', trim($node->text()));
199 1
            $totalChars += strlen($text);
200 1
            $totalWords += count(explode(' ', $text));
201 1
        });
202
203 1
        return [$totalChars, $totalWords];
204
    }
205
206
    /**
207
     * Get the page assessments of the page.
208
     * @see https://www.mediawiki.org/wiki/Extension:PageAssessments
209
     * @return string[]|false False if unsupported.
210
     * @codeCoverageIgnore
211
     */
212
    public function getAssessments()
213
    {
214
        if (!is_array($this->assessments)) {
0 ignored issues
show
introduced by
The condition is_array($this->assessments) is always true.
Loading history...
215
            $this->assessments = $this->page
216
                ->getProject()
217
                ->getPageAssessments()
218
                ->getAssessments($this->page);
219
        }
220
        return $this->assessments;
221
    }
222
223
    /**
224
     * Get the list of page's wikidata and Checkwiki errors.
225
     * @see Page::getErrors()
226
     * @return string[]
227
     */
228
    public function getBugs(): array
229
    {
230
        if (!is_array($this->bugs)) {
0 ignored issues
show
introduced by
The condition is_array($this->bugs) is always true.
Loading history...
231
            $this->bugs = $this->page->getErrors();
232
        }
233
        return $this->bugs;
234
    }
235
236
    /**
237
     * Get the number of wikidata nad CheckWiki errors.
238
     * @return int
239
     */
240
    public function numBugs(): int
241
    {
242
        return count($this->getBugs());
243
    }
244
245
    /**
246
     * Generate the data structure that will used in the ArticleInfo API response.
247
     * @param Project $project
248
     * @param Page $page
249
     * @return array
250
     * @codeCoverageIgnore
251
     */
252
    public function getArticleInfoApiData(Project $project, Page $page): array
253
    {
254
        /** @var int $pageviewsOffset Number of days to query for pageviews */
255
        $pageviewsOffset = 30;
256
257
        $data = [
258
            'project' => $project->getDomain(),
259
            'page' => $page->getTitle(),
260
            'watchers' => (int) $page->getWatchers(),
261
            'pageviews' => $page->getLastPageviews($pageviewsOffset),
262
            'pageviews_offset' => $pageviewsOffset,
263
        ];
264
265
        $info = false;
266
267
        try {
268
            $articleInfoRepo = new ArticleInfoRepository();
269
            $articleInfoRepo->setContainer($this->container);
270
            $info = $articleInfoRepo->getBasicEditingInfo($page);
271
        } catch (ServiceUnavailableHttpException $e) {
272
            // No more open database connections.
273
            $data['error'] = 'Unable to fetch revision data. Please try again later.';
274
        } catch (HttpException $e) {
275
            /**
276
             * The query most likely exceeded the maximum query time,
277
             * so we'll abort and give only info retrieved by the API.
278
             */
279
            $data['error'] = 'Unable to fetch revision data. The query may have timed out.';
280
        }
281
282
        if (false !== $info) {
283
            $creationDateTime = DateTime::createFromFormat('YmdHis', $info['created_at']);
284
            $modifiedDateTime = DateTime::createFromFormat('YmdHis', $info['modified_at']);
285
            $secsSinceLastEdit = (new DateTime)->getTimestamp() - $modifiedDateTime->getTimestamp();
286
287
            // Some wikis (such foundation.wikimedia.org) may be missing the creation date.
288
            $creationDateTime = false === $creationDateTime
289
                ? null
290
                : $creationDateTime->format('Y-m-d');
291
292
            $assessment = $page->getProject()
293
                ->getPageAssessments()
294
                ->getAssessment($page);
295
296
            $data = array_merge($data, [
297
                'revisions' => (int) $info['num_edits'],
298
                'editors' => (int) $info['num_editors'],
299
                'minor_edits' => (int) $info['minor_edits'],
300
                'author' => $info['author'],
301
                'author_editcount' => (int) $info['author_editcount'],
302
                'created_at' => $creationDateTime,
303
                'created_rev_id' => $info['created_rev_id'],
304
                'modified_at' => $modifiedDateTime->format('Y-m-d H:i'),
305
                'secs_since_last_edit' => $secsSinceLastEdit,
306
                'last_edit_id' => (int) $info['modified_rev_id'],
307
                'assessment' => $assessment,
308
            ]);
309
        }
310
311
        return $data;
312
    }
313
314
    /************************ Link statistics ************************/
315
316
    /**
317
     * Get the number of external links on the page.
318
     * @return int
319
     */
320 1
    public function linksExtCount(): int
321
    {
322 1
        return $this->getLinksAndRedirects()['links_ext_count'];
323
    }
324
325
    /**
326
     * Get the number of incoming links to the page.
327
     * @return int
328
     */
329 1
    public function linksInCount(): int
330
    {
331 1
        return $this->getLinksAndRedirects()['links_in_count'];
332
    }
333
334
    /**
335
     * Get the number of outgoing links from the page.
336
     * @return int
337
     */
338 1
    public function linksOutCount(): int
339
    {
340 1
        return $this->getLinksAndRedirects()['links_out_count'];
341
    }
342
343
    /**
344
     * Get the number of redirects to the page.
345
     * @return int
346
     */
347 1
    public function redirectsCount(): int
348
    {
349 1
        return $this->getLinksAndRedirects()['redirects_count'];
350
    }
351
352
    /**
353
     * Get the number of external, incoming and outgoing links, along with the number of redirects to the page.
354
     * @return int[]
355
     * @codeCoverageIgnore
356
     */
357
    private function getLinksAndRedirects(): array
358
    {
359
        if (!is_array($this->linksAndRedirects)) {
0 ignored issues
show
introduced by
The condition is_array($this->linksAndRedirects) is always true.
Loading history...
360
            $this->linksAndRedirects = $this->page->countLinksAndRedirects();
361
        }
362
        return $this->linksAndRedirects;
363
    }
364
365
    /**
366
     * Fetch transclusion data (categories, templates and files) that are on the page.
367
     * @return array With keys 'categories', 'templates' and 'files'.
368
     */
369 1
    public function getTransclusionData(): array
370
    {
371 1
        if (!is_array($this->transclusionData)) {
0 ignored issues
show
introduced by
The condition is_array($this->transclusionData) is always true.
Loading history...
372 1
            $this->transclusionData = $this->getRepository()
373 1
                ->getTransclusionData($this->page);
374
        }
375 1
        return $this->transclusionData;
376
    }
377
378
    /**
379
     * Get the number of categories that are on the page.
380
     * @return int
381
     */
382 1
    public function getNumCategories(): int
383
    {
384 1
        return $this->getTransclusionData()['categories'];
385
    }
386
387
    /**
388
     * Get the number of templates that are on the page.
389
     * @return int
390
     */
391 1
    public function getNumTemplates(): int
392
    {
393 1
        return $this->getTransclusionData()['templates'];
394
    }
395
396
    /**
397
     * Get the number of files that are on the page.
398
     * @return int
399
     */
400 1
    public function getNumFiles(): int
401
    {
402 1
        return $this->getTransclusionData()['files'];
403
    }
404
405
    /************************ Bot statistics ************************/
406
407
    /**
408
     * Number of edits made to the page by current or former bots.
409
     * @param string[] $bots Used only in unit tests, where we supply mock data for the bots that will get processed.
410
     * @return int
411
     */
412 2
    public function getBotRevisionCount(?array $bots = null): int
413
    {
414 2
        if (isset($this->botRevisionCount)) {
415
            return $this->botRevisionCount;
416
        }
417
418 2
        if (null === $bots) {
419 1
            $bots = $this->getBots();
420
        }
421
422 2
        $count = 0;
423
424 2
        foreach (array_values($bots) as $data) {
425 2
            $count += $data['count'];
426
        }
427
428 2
        $this->botRevisionCount = $count;
429 2
        return $count;
430
    }
431
432
    /**
433
     * Get and set $this->bots about bots that edited the page. This is done as a private setter because we need
434
     * this information when computing the top 10 editors in ArticleInfo, where we don't want to include bots.
435
     * @return mixed[]
436
     */
437 1
    public function getBots(): array
438
    {
439 1
        if (isset($this->bots)) {
440 1
            return $this->bots;
441
        }
442
443
        // Parse the bot edits.
444
        $this->bots = [];
445
446
        /** @var Statement $botData */
447
        $botData = $this->getRepository()->getBotData($this->page, $this->start, $this->end);
0 ignored issues
show
Bug introduced by
The method getBotData() does not exist on AppBundle\Repository\Repository. It seems like you code against a sub-type of AppBundle\Repository\Repository such as AppBundle\Repository\ArticleInfoRepository. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

447
        $botData = $this->getRepository()->/** @scrutinizer ignore-call */ getBotData($this->page, $this->start, $this->end);
Loading history...
448
        while ($bot = $botData->fetch()) {
449
            $this->bots[$bot['username']] = [
450
                'count' => (int)$bot['count'],
451
                'current' => '1' === $bot['current'],
452
            ];
453
        }
454
455
        // Sort by edit count.
456
        uasort($this->bots, function ($a, $b) {
457
            return $b['count'] - $a['count'];
458
        });
459
460
        return $this->bots;
461
    }
462
463
    /**
464
     * Get the number of bots that edited the page.
465
     * @return int
466
     */
467
    public function getNumBots(): int
468
    {
469
        return count($this->getBots());
470
    }
471
}
472