Passed
Push — master ( ed4035...f2dd38 )
by MusikAnimal
10:32
created

ArticleInfoApi::linksInCount()   A

Complexity

Conditions 1
Paths 1

Size

Total Lines 3
Code Lines 1

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 2
CRAP Score 1

Importance

Changes 1
Bugs 0 Features 0
Metric Value
cc 1
eloc 1
nc 1
nop 0
dl 0
loc 3
rs 10
c 1
b 0
f 0
ccs 2
cts 2
cp 1
crap 1
1
<?php
2
declare(strict_types = 1);
3
4
namespace AppBundle\Model;
5
6
use AppBundle\Repository\ArticleInfoRepository;
7
use DateTime;
8
use Doctrine\DBAL\Statement;
9
use Symfony\Component\DependencyInjection\ContainerInterface;
10
use Symfony\Component\DomCrawler\Crawler;
11
use Symfony\Component\HttpKernel\Exception\HttpException;
12
use Symfony\Component\HttpKernel\Exception\ServiceUnavailableHttpException;
13
14
/**
15
 * An ArticleInfoApi is standalone logic for the Article Info tool. These methods perform SQL queries
16
 * or make API requests and can be called directly, without any knowledge of the child ArticleInfo class.
17
 * It does require that the ArticleInfoRepository be set, however.
18
 * @see ArticleInfo
19
 */
20
class ArticleInfoApi extends Model
21
{
22
    /** @var ContainerInterface The application's DI container. */
23
    protected $container;
24
25
    /** @var int Number of revisions that belong to the page. */
26
    protected $numRevisions;
27
28
    /** @var mixed[] Prose stats, with keys 'characters', 'words', 'references', 'unique_references', 'sections'. */
29
    protected $proseStats;
30
31
    /** @var array Number of categories, templates and files on the page. */
32
    protected $transclusionData;
33
34
    /** @var mixed[] Various statistics about bots that edited the page. */
35
    protected $bots;
36
37
    /** @var int Number of edits made to the page by bots. */
38
    protected $botRevisionCount;
39
40
    /** @var int[] Number of in and outgoing links and redirects to the page. */
41
    protected $linksAndRedirects;
42
43
    /** @var string[] Assessments of the page (see Page::getAssessments). */
44
    protected $assessments;
45
46
    /** @var string[] List of Wikidata and Checkwiki errors. */
47
    protected $bugs;
48
49
    /**
50
     * ArticleInfoApi constructor.
51
     * @param Page $page The page to process.
52
     * @param ContainerInterface $container The DI container.
53
     * @param false|int $start From what date to obtain records.
54
     * @param false|int $end To what date to obtain records.
55
     */
56 12
    public function __construct(Page $page, ContainerInterface $container, $start = false, $end = false)
57
    {
58 12
        $this->page = $page;
59 12
        $this->container = $container;
60 12
        $this->start = $start;
61 12
        $this->end = $end;
62 12
    }
63
64
    /**
65
     * Get date opening date range, formatted as this is used in the views.
66
     * @return string Blank if no value exists.
67
     */
68 1
    public function getStartDate(): string
69
    {
70 1
        return '' == $this->start ? '' : date('Y-m-d', $this->start);
0 ignored issues
show
Bug introduced by
It seems like $this->start can also be of type boolean and string; however, parameter $timestamp of date() does only seem to accept integer|null, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

70
        return '' == $this->start ? '' : date('Y-m-d', /** @scrutinizer ignore-type */ $this->start);
Loading history...
71
    }
72
73
    /**
74
     * Get date closing date range, formatted as this is used in the views.
75
     * @return string Blank if no value exists.
76
     */
77 1
    public function getEndDate(): string
78
    {
79 1
        return '' == $this->end ? '' : date('Y-m-d', $this->end);
0 ignored issues
show
Bug introduced by
It seems like $this->end can also be of type boolean and string; however, parameter $timestamp of date() does only seem to accept integer|null, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

79
        return '' == $this->end ? '' : date('Y-m-d', /** @scrutinizer ignore-type */ $this->end);
Loading history...
80
    }
81
82
    /**
83
     * Get the number of revisions belonging to the page.
84
     * @return int
85
     */
86 4
    public function getNumRevisions(): int
87
    {
88 4
        if (!isset($this->numRevisions)) {
89 4
            $this->numRevisions = $this->page->getNumRevisions(null, $this->start, $this->end);
0 ignored issues
show
Bug introduced by
It seems like $this->start can also be of type string; however, parameter $start of AppBundle\Model\Page::getNumRevisions() does only seem to accept false|integer, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

89
            $this->numRevisions = $this->page->getNumRevisions(null, /** @scrutinizer ignore-type */ $this->start, $this->end);
Loading history...
Bug introduced by
It seems like $this->end can also be of type string; however, parameter $end of AppBundle\Model\Page::getNumRevisions() does only seem to accept false|integer, maybe add an additional type check? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

89
            $this->numRevisions = $this->page->getNumRevisions(null, $this->start, /** @scrutinizer ignore-type */ $this->end);
Loading history...
90
        }
91 4
        return $this->numRevisions;
92
    }
93
94
    /**
95
     * Get various basic info used in the API, including the number of revisions, unique authors, initial author
96
     * and edit count of the initial author. This is combined into one query for better performance. Caching is
97
     * intentionally disabled, because using the gadget, this will get hit for a different page constantly, where
98
     * the likelihood of cache benefiting us is slim.
99
     * @return string[]|false false if the page was not found.
100
     */
101
    public function getBasicEditingInfo()
102
    {
103
        return $this->getRepository()->getBasicEditingInfo($this->page);
0 ignored issues
show
Bug introduced by
The method getBasicEditingInfo() does not exist on AppBundle\Repository\Repository. It seems like you code against a sub-type of AppBundle\Repository\Repository such as AppBundle\Repository\ArticleInfoRepository. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

103
        return $this->getRepository()->/** @scrutinizer ignore-call */ getBasicEditingInfo($this->page);
Loading history...
104
    }
105
106
    /**
107
     * Get the top editors to the page by edit count.
108
     * @param int $limit Default 20, maximum 1,000.
109
     * @param bool $noBots Set to non-false to exclude bots from the result.
110
     * @return array
111
     */
112
    public function getTopEditorsByEditCount(int $limit = 20, bool $noBots = false): array
113
    {
114
        // Quick cache, valid only for the same request.
115
        static $topEditors = null;
116
        if (null !== $topEditors) {
117
            return $topEditors;
118
        }
119
120
        $rows = $this->getRepository()->getTopEditorsByEditCount(
0 ignored issues
show
Bug introduced by
The method getTopEditorsByEditCount() does not exist on AppBundle\Repository\Repository. It seems like you code against a sub-type of AppBundle\Repository\Repository such as AppBundle\Repository\ArticleInfoRepository. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

120
        $rows = $this->getRepository()->/** @scrutinizer ignore-call */ getTopEditorsByEditCount(
Loading history...
121
            $this->page,
122
            $this->start,
123
            $this->end,
124
            min($limit, 1000),
125
            $noBots
126
        );
127
128
        $topEditors = [];
129
        $rank = 0;
130
        foreach ($rows as $row) {
131
            $topEditors[] = [
132
                'rank' => ++$rank,
133
                'username' => $row['username'],
134
                'count' => $row['count'],
135
                'minor' => $row['minor'],
136
                'first_edit' => [
137
                    'id' => $row['first_revid'],
138
                    'timestamp' => $row['first_timestamp'],
139
                ],
140
                'latest_edit' => [
141
                    'id' => $row['latest_revid'],
142
                    'timestamp' => $row['latest_timestamp'],
143
                ],
144
            ];
145
        }
146
147
        return $topEditors;
148
    }
149
150
    /**
151
     * Get prose and reference information.
152
     * @return array With keys 'characters', 'words', 'references', 'unique_references'
153
     */
154 1
    public function getProseStats(): array
155
    {
156 1
        if (isset($this->proseStats)) {
157
            return $this->proseStats;
158
        }
159
160 1
        $datetime = false !== $this->end ? new DateTime('@'.$this->end) : null;
0 ignored issues
show
Bug introduced by
Are you sure $this->end of type integer|string|true can be used in concatenation? ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-type  annotation

160
        $datetime = false !== $this->end ? new DateTime('@'./** @scrutinizer ignore-type */ $this->end) : null;
Loading history...
161 1
        $html = $this->page->getHTMLContent($datetime);
162
163 1
        $crawler = new Crawler($html);
164
165 1
        [$chars, $words] = $this->countCharsAndWords($crawler, '#mw-content-text p');
166
167 1
        $refs = $crawler->filter('#mw-content-text .reference');
168 1
        $refContent = [];
169
        $refs->each(function ($ref) use (&$refContent): void {
170 1
            $refContent[] = $ref->text();
171 1
        });
172 1
        $uniqueRefs = count(array_unique($refContent));
173
174 1
        $sections = count($crawler->filter('#mw-content-text .mw-headline'));
175
176 1
        $this->proseStats = [
177 1
            'characters' => $chars,
178 1
            'words' => $words,
179 1
            'references' => $refs->count(),
180 1
            'unique_references' => $uniqueRefs,
181 1
            'sections' => $sections,
182
        ];
183 1
        return $this->proseStats;
184
    }
185
186
    /**
187
     * Count the number of characters and words of the plain text within the DOM element matched by the given selector.
188
     * @param Crawler $crawler
189
     * @param string $selector HTML selector.
190
     * @return array [num chars, num words]
191
     */
192 1
    private function countCharsAndWords(Crawler $crawler, string $selector): array
193
    {
194 1
        $totalChars = 0;
195 1
        $totalWords = 0;
196 1
        $paragraphs = $crawler->filter($selector);
197
        $paragraphs->each(function ($node) use (&$totalChars, &$totalWords): void {
198
            /** @var Crawler $node */
199 1
            $text = preg_replace('/\[\d+]/', '', trim($node->text(null, true)));
200 1
            $totalChars += strlen($text);
201 1
            $totalWords += count(explode(' ', $text));
202 1
        });
203
204 1
        return [$totalChars, $totalWords];
205
    }
206
207
    /**
208
     * Get the page assessments of the page.
209
     * @see https://www.mediawiki.org/wiki/Extension:PageAssessments
210
     * @return string[]|false False if unsupported.
211
     * @codeCoverageIgnore
212
     */
213
    public function getAssessments()
214
    {
215
        if (!is_array($this->assessments)) {
0 ignored issues
show
introduced by
The condition is_array($this->assessments) is always true.
Loading history...
216
            $this->assessments = $this->page
217
                ->getProject()
218
                ->getPageAssessments()
219
                ->getAssessments($this->page);
220
        }
221
        return $this->assessments;
222
    }
223
224
    /**
225
     * Get the list of page's wikidata and Checkwiki errors.
226
     * @see Page::getErrors()
227
     * @return string[]
228
     */
229
    public function getBugs(): array
230
    {
231
        if (!is_array($this->bugs)) {
0 ignored issues
show
introduced by
The condition is_array($this->bugs) is always true.
Loading history...
232
            $this->bugs = $this->page->getErrors();
233
        }
234
        return $this->bugs;
235
    }
236
237
    /**
238
     * Get the number of wikidata nad CheckWiki errors.
239
     * @return int
240
     */
241
    public function numBugs(): int
242
    {
243
        return count($this->getBugs());
244
    }
245
246
    /**
247
     * Generate the data structure that will used in the ArticleInfo API response.
248
     * @param Project $project
249
     * @param Page $page
250
     * @return array
251
     * @codeCoverageIgnore
252
     */
253
    public function getArticleInfoApiData(Project $project, Page $page): array
254
    {
255
        /** @var int $pageviewsOffset Number of days to query for pageviews */
256
        $pageviewsOffset = 30;
257
258
        $data = [
259
            'project' => $project->getDomain(),
260
            'page' => $page->getTitle(),
261
            'watchers' => (int) $page->getWatchers(),
262
            'pageviews' => $page->getLastPageviews($pageviewsOffset),
263
            'pageviews_offset' => $pageviewsOffset,
264
        ];
265
266
        $info = false;
267
268
        try {
269
            $articleInfoRepo = new ArticleInfoRepository();
270
            $articleInfoRepo->setContainer($this->container);
271
            $info = $articleInfoRepo->getBasicEditingInfo($page);
272
        } catch (ServiceUnavailableHttpException $e) {
273
            // No more open database connections.
274
            $data['error'] = 'Unable to fetch revision data. Please try again later.';
275
        } catch (HttpException $e) {
276
            /**
277
             * The query most likely exceeded the maximum query time,
278
             * so we'll abort and give only info retrieved by the API.
279
             */
280
            $data['error'] = 'Unable to fetch revision data. The query may have timed out.';
281
        }
282
283
        if (false !== $info) {
284
            $creationDateTime = DateTime::createFromFormat('YmdHis', $info['created_at']);
285
            $modifiedDateTime = DateTime::createFromFormat('YmdHis', $info['modified_at']);
286
            $secsSinceLastEdit = (new DateTime)->getTimestamp() - $modifiedDateTime->getTimestamp();
287
288
            // Some wikis (such foundation.wikimedia.org) may be missing the creation date.
289
            $creationDateTime = false === $creationDateTime
290
                ? null
291
                : $creationDateTime->format('Y-m-d');
292
293
            $assessment = $page->getProject()
294
                ->getPageAssessments()
295
                ->getAssessment($page);
296
297
            $data = array_merge($data, [
298
                'revisions' => (int) $info['num_edits'],
299
                'editors' => (int) $info['num_editors'],
300
                'minor_edits' => (int) $info['minor_edits'],
301
                'author' => $info['author'],
302
                'author_editcount' => (int) $info['author_editcount'],
303
                'created_at' => $creationDateTime,
304
                'created_rev_id' => $info['created_rev_id'],
305
                'modified_at' => $modifiedDateTime->format('Y-m-d H:i'),
306
                'secs_since_last_edit' => $secsSinceLastEdit,
307
                'last_edit_id' => (int) $info['modified_rev_id'],
308
                'assessment' => $assessment,
309
            ]);
310
        }
311
312
        return $data;
313
    }
314
315
    /************************ Link statistics ************************/
316
317
    /**
318
     * Get the number of external links on the page.
319
     * @return int
320
     */
321 1
    public function linksExtCount(): int
322
    {
323 1
        return $this->getLinksAndRedirects()['links_ext_count'];
324
    }
325
326
    /**
327
     * Get the number of incoming links to the page.
328
     * @return int
329
     */
330 1
    public function linksInCount(): int
331
    {
332 1
        return $this->getLinksAndRedirects()['links_in_count'];
333
    }
334
335
    /**
336
     * Get the number of outgoing links from the page.
337
     * @return int
338
     */
339 1
    public function linksOutCount(): int
340
    {
341 1
        return $this->getLinksAndRedirects()['links_out_count'];
342
    }
343
344
    /**
345
     * Get the number of redirects to the page.
346
     * @return int
347
     */
348 1
    public function redirectsCount(): int
349
    {
350 1
        return $this->getLinksAndRedirects()['redirects_count'];
351
    }
352
353
    /**
354
     * Get the number of external, incoming and outgoing links, along with the number of redirects to the page.
355
     * @return int[]
356
     * @codeCoverageIgnore
357
     */
358
    private function getLinksAndRedirects(): array
359
    {
360
        if (!is_array($this->linksAndRedirects)) {
0 ignored issues
show
introduced by
The condition is_array($this->linksAndRedirects) is always true.
Loading history...
361
            $this->linksAndRedirects = $this->page->countLinksAndRedirects();
362
        }
363
        return $this->linksAndRedirects;
364
    }
365
366
    /**
367
     * Fetch transclusion data (categories, templates and files) that are on the page.
368
     * @return array With keys 'categories', 'templates' and 'files'.
369
     */
370 1
    public function getTransclusionData(): array
371
    {
372 1
        if (!is_array($this->transclusionData)) {
0 ignored issues
show
introduced by
The condition is_array($this->transclusionData) is always true.
Loading history...
373 1
            $this->transclusionData = $this->getRepository()
374 1
                ->getTransclusionData($this->page);
375
        }
376 1
        return $this->transclusionData;
377
    }
378
379
    /**
380
     * Get the number of categories that are on the page.
381
     * @return int
382
     */
383 1
    public function getNumCategories(): int
384
    {
385 1
        return $this->getTransclusionData()['categories'];
386
    }
387
388
    /**
389
     * Get the number of templates that are on the page.
390
     * @return int
391
     */
392 1
    public function getNumTemplates(): int
393
    {
394 1
        return $this->getTransclusionData()['templates'];
395
    }
396
397
    /**
398
     * Get the number of files that are on the page.
399
     * @return int
400
     */
401 1
    public function getNumFiles(): int
402
    {
403 1
        return $this->getTransclusionData()['files'];
404
    }
405
406
    /************************ Bot statistics ************************/
407
408
    /**
409
     * Number of edits made to the page by current or former bots.
410
     * @param string[] $bots Used only in unit tests, where we supply mock data for the bots that will get processed.
411
     * @return int
412
     */
413 2
    public function getBotRevisionCount(?array $bots = null): int
414
    {
415 2
        if (isset($this->botRevisionCount)) {
416
            return $this->botRevisionCount;
417
        }
418
419 2
        if (null === $bots) {
420 1
            $bots = $this->getBots();
421
        }
422
423 2
        $count = 0;
424
425 2
        foreach (array_values($bots) as $data) {
426 2
            $count += $data['count'];
427
        }
428
429 2
        $this->botRevisionCount = $count;
430 2
        return $count;
431
    }
432
433
    /**
434
     * Get and set $this->bots about bots that edited the page. This is done as a private setter because we need
435
     * this information when computing the top 10 editors in ArticleInfo, where we don't want to include bots.
436
     * @return mixed[]
437
     */
438 1
    public function getBots(): array
439
    {
440 1
        if (isset($this->bots)) {
441 1
            return $this->bots;
442
        }
443
444
        // Parse the bot edits.
445
        $this->bots = [];
446
447
        /** @var Statement $botData */
448
        $botData = $this->getRepository()->getBotData($this->page, $this->start, $this->end);
0 ignored issues
show
Bug introduced by
The method getBotData() does not exist on AppBundle\Repository\Repository. It seems like you code against a sub-type of AppBundle\Repository\Repository such as AppBundle\Repository\ArticleInfoRepository. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

448
        $botData = $this->getRepository()->/** @scrutinizer ignore-call */ getBotData($this->page, $this->start, $this->end);
Loading history...
449
        while ($bot = $botData->fetch()) {
450
            $this->bots[$bot['username']] = [
451
                'count' => (int)$bot['count'],
452
                'current' => '1' === $bot['current'],
453
            ];
454
        }
455
456
        // Sort by edit count.
457
        uasort($this->bots, function ($a, $b) {
458
            return $b['count'] - $a['count'];
459
        });
460
461
        return $this->bots;
462
    }
463
464
    /**
465
     * Get the number of bots that edited the page.
466
     * @return int
467
     */
468
    public function getNumBots(): int
469
    {
470
        return count($this->getBots());
471
    }
472
}
473