Completed
Push — master ( 5c8e7a...62162d )
by MusikAnimal
22s
created

PagesRepository::getRevisions()   B

Complexity

Conditions 3
Paths 4

Size

Total Lines 25
Code Lines 15

Duplication

Lines 0
Ratio 0 %

Importance

Changes 0
Metric Value
dl 0
loc 25
rs 8.8571
c 0
b 0
f 0
cc 3
eloc 15
nc 4
nop 2
1
<?php
2
/**
3
 * This file contains only the PagesRepository class.
4
 */
5
6
namespace Xtools;
7
8
use DateInterval;
9
use Mediawiki\Api\SimpleRequest;
10
use GuzzleHttp;
11
12
/**
13
 * A PagesRepository fetches data about Pages, either singularly or for multiple.
14
 */
15
class PagesRepository extends Repository
16
{
17
18
    /**
19
     * Get metadata about a single page from the API.
20
     * @param Project $project The project to which the page belongs.
21
     * @param string $pageTitle Page title.
22
     * @return string[] Array with some of the following keys: pageid, title, missing, displaytitle,
23
     * url.
24
     */
25
    public function getPageInfo(Project $project, $pageTitle)
26
    {
27
        $info = $this->getPagesInfo($project, [$pageTitle]);
28
        return array_shift($info);
29
    }
30
31
    /**
32
     * Get metadata about a set of pages from the API.
33
     * @param Project $project The project to which the pages belong.
34
     * @param string[] $pageTitles Array of page titles.
35
     * @return string[] Array keyed by the page names, each element with some of the
36
     * following keys: pageid, title, missing, displaytitle, url.
37
     */
38
    public function getPagesInfo(Project $project, $pageTitles)
39
    {
40
        // @TODO: Also include 'extlinks' prop when we start checking for dead external links.
41
        $params = [
42
            'prop' => 'info|pageprops',
43
            'inprop' => 'protection|talkid|watched|watchers|notificationtimestamp|subjectid|url|readable|displaytitle',
44
            'converttitles' => '',
45
            // 'ellimit' => 20,
1 ignored issue
show
Unused Code Comprehensibility introduced by
58% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
46
            // 'elexpandurl' => '',
1 ignored issue
show
Unused Code Comprehensibility introduced by
58% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
47
            'titles' => join('|', $pageTitles),
48
            'formatversion' => 2
49
            // 'pageids' => $pageIds // FIXME: allow page IDs
1 ignored issue
show
Unused Code Comprehensibility introduced by
43% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
50
        ];
51
52
        $query = new SimpleRequest('query', $params);
53
        $api = $this->getMediawikiApi($project);
54
        $res = $api->getRequest($query);
55
        $result = [];
56
        if (isset($res['query']['pages'])) {
57
            foreach ($res['query']['pages'] as $pageInfo) {
58
                $result[$pageInfo['title']] = $pageInfo;
59
            }
60
        }
61
        return $result;
62
    }
63
64
    /**
65
     * Get the full page text of a set of pages.
66
     * @param Project $project The project to which the pages belong.
67
     * @param string[] $pageTitles Array of page titles.
68
     * @return string[] Array keyed by the page names, with the page text as the values.
69
     */
70
    public function getPagesWikitext(Project $project, $pageTitles)
71
    {
72
        $query = new SimpleRequest('query', [
73
            'prop' => 'revisions',
74
            'rvprop' => 'content',
75
            'titles' => join('|', $pageTitles),
76
            'formatversion' => 2,
77
        ]);
78
        $result = [];
79
80
        $api = $this->getMediawikiApi($project);
81
        $res = $api->getRequest($query);
82
83
        if (!isset($res['query']['pages'])) {
84
            return [];
85
        }
86
87
        foreach ($res['query']['pages'] as $page) {
88
            if (isset($page['revisions'][0]['content'])) {
89
                $result[$page['title']] = $page['revisions'][0]['content'];
90
            } else {
91
                $result[$page['title']] = '';
92
            }
93
        }
94
95
        return $result;
96
    }
97
98
    /**
99
     * Get revisions of a single page.
100
     * @param Page $page The page.
101
     * @param User|null $user Specify to get only revisions by the given user.
102
     * @return string[] Each member with keys: id, timestamp, length-
103
     */
104
    public function getRevisions(Page $page, User $user = null)
105
    {
106
        $cacheKey = 'revisions.'.$page->getId();
107
        if ($user) {
108
            $cacheKey .= '.'.$user->getCacheKey();
109
        }
110
111
        if ($this->cache->hasItem($cacheKey)) {
112
            return $this->cache->getItem($cacheKey)->get();
113
        }
114
115
        $this->stopwatch->start($cacheKey, 'XTools');
116
117
        $stmt = $this->getRevisionsStmt($page, $user);
118
        $result = $stmt->fetchAll();
119
120
        // Cache for 10 minutes, and return.
1 ignored issue
show
Unused Code Comprehensibility introduced by
36% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
121
        $cacheItem = $this->cache->getItem($cacheKey)
122
            ->set($result)
123
            ->expiresAfter(new DateInterval('PT10M'));
124
        $this->cache->save($cacheItem);
125
        $this->stopwatch->stop($cacheKey);
126
127
        return $result;
128
    }
129
130
    /**
131
     * Get the statement for a single revision, so that you can iterate row by row.
132
     * @param Page $page The page.
133
     * @param User|null $user Specify to get only revisions by the given user.
134
     * @return Doctrine\DBAL\Driver\PDOStatement
135
     */
136 View Code Duplication
    public function getRevisionsStmt(Page $page, User $user = null)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
137
    {
138
        $revTable = $this->getTableName($page->getProject()->getDatabaseName(), 'revision');
139
        $userClause = $user ? "revs.rev_user_text in (:username) AND " : "";
140
141
        $sql = "SELECT
142
                    revs.rev_id AS id,
143
                    revs.rev_timestamp AS timestamp,
144
                    revs.rev_minor_edit AS minor,
145
                    revs.rev_len AS length,
146
                    (CAST(revs.rev_len AS SIGNED) - IFNULL(parentrevs.rev_len, 0)) AS length_change,
147
                    revs.rev_user AS user_id,
148
                    revs.rev_user_text AS username,
149
                    revs.rev_comment AS comment
150
                FROM $revTable AS revs
151
                LEFT JOIN $revTable AS parentrevs ON (revs.rev_parent_id = parentrevs.rev_id)
152
                WHERE $userClause revs.rev_page = :pageid
153
                ORDER BY revs.rev_timestamp ASC";
154
155
        $params = ['pageid' => $page->getId()];
156
        if ($user) {
157
            $params['username'] = $user->getUsername();
158
        }
159
160
        $conn = $this->getProjectsConnection();
161
        return $conn->executeQuery($sql, $params);
162
    }
163
164
    /**
165
     * Get a count of the number of revisions of a single page
166
     * @param Page $page The page.
167
     * @param User|null $user Specify to only count revisions by the given user.
168
     * @return int
169
     */
170 View Code Duplication
    public function getNumRevisions(Page $page, User $user = null)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
171
    {
172
        $revTable = $this->getTableName($page->getProject()->getDatabaseName(), 'revision');
173
        $userClause = $user ? "rev_user_text in (:username) AND " : "";
174
175
        $sql = "SELECT COUNT(*)
176
                FROM $revTable
177
                WHERE $userClause rev_page = :pageid";
178
        $params = ['pageid' => $page->getId()];
179
        if ($user) {
180
            $params['username'] = $user->getUsername();
181
        }
182
        $conn = $this->getProjectsConnection();
183
        return $conn->executeQuery($sql, $params)->fetchColumn(0);
184
    }
185
186
    /**
187
     * Get various basic info used in the API, including the
188
     *   number of revisions, unique authors, initial author
189
     *   and edit count of the initial author.
190
     * This is combined into one query for better performance.
191
     * Caching is intentionally disabled, because using the gadget,
192
     *   this will get hit for a different page constantly, where
193
     *   the likelihood of cache benefiting us is slim.
194
     * @param Page $page The page.
195
     * @return string[]
196
     */
197
    public function getBasicEditingInfo(Page $page)
198
    {
199
        $revTable = $this->getTableName($page->getProject()->getDatabaseName(), 'revision');
200
        $userTable = $this->getTableName($page->getProject()->getDatabaseName(), 'user');
201
        $pageTable = $this->getTableName($page->getProject()->getDatabaseName(), 'page');
202
203
        $sql = "SELECT *, (
204
                   SELECT user_editcount
205
                   FROM $userTable
206
                   WHERE user_name = author
207
                ) AS author_editcount
208
                FROM (
209
                    (
210
                        SELECT COUNT(*) AS num_edits,
211
                               COUNT(DISTINCT(rev_user_text)) AS num_editors
212
                        FROM $revTable
213
                        WHERE rev_page = :pageid
214
                    ) a,
215
                    (
216
                        # With really old pages, the rev_timestamp may need to be sorted ASC,
217
                        #   and the lowest rev_id may not be the first revision.
218
                        SELECT rev_user_text AS author,
219
                               rev_timestamp AS created_at,
220
                               rev_id AS created_rev_id
221
                        FROM $revTable
222
                        WHERE rev_page = :pageid
223
                        ORDER BY rev_timestamp ASC
224
                        LIMIT 1
225
                    ) b,
226
                    (
227
                        SELECT MAX(rev_timestamp) AS modified_at
228
                        FROM $revTable
229
                        WHERE rev_page = :pageid
230
                    ) c,
231
                    (
232
                        SELECT page_latest AS modified_rev_id
233
                        FROM $pageTable
234
                        WHERE page_id = :pageid
235
                    ) d
236
                );";
237
        $params = ['pageid' => $page->getId()];
238
        $conn = $this->getProjectsConnection();
239
        return $conn->executeQuery($sql, $params)->fetch();
240
    }
241
242
    /**
243
     * Get assessment data for the given pages
244
     * @param Project   $project The project to which the pages belong.
245
     * @param  int[]    $pageIds Page IDs
246
     * @return string[] Assessment data as retrieved from the database.
247
     */
248
    public function getAssessments(Project $project, $pageIds)
249
    {
250
        if (!$project->hasPageAssessments()) {
251
            return [];
252
        }
253
        $paTable = $this->getTableName($project->getDatabaseName(), 'page_assessments');
254
        $papTable = $this->getTableName($project->getDatabaseName(), 'page_assessments_projects');
255
        $pageIds = implode($pageIds, ',');
256
257
        $query = "SELECT pap_project_title AS wikiproject, pa_class AS class, pa_importance AS importance
258
                  FROM $paTable
259
                  LEFT JOIN $papTable ON pa_project_id = pap_project_id
260
                  WHERE pa_page_id IN ($pageIds)";
261
262
        $conn = $this->getProjectsConnection();
263
        return $conn->executeQuery($query)->fetchAll();
264
    }
265
266
    /**
267
     * Get any CheckWiki errors of a single page
268
     * @param Page $page
269
     * @return array Results from query
270
     */
271
    public function getCheckWikiErrors(Page $page)
272
    {
273
        // Only support mainspace on Labs installations
274
        if ($page->getNamespace() !== 0 || !$this->isLabs()) {
275
            return [];
276
        }
277
278
        $sql = "SELECT error, notice, found, name_trans AS name, prio, text_trans AS explanation
279
                FROM s51080__checkwiki_p.cw_error a
280
                JOIN s51080__checkwiki_p.cw_overview_errors b
281
                WHERE a.project = b.project
282
                AND a.project = :dbName
283
                AND a.title = :title
284
                AND a.error = b.id
285
                AND a.ok = 0";
286
287
        // remove _p if present
288
        $dbName = preg_replace('/_p$/', '', $page->getProject()->getDatabaseName());
289
290
        // Page title without underscores (str_replace just to be sure)
291
        $pageTitle = str_replace('_', ' ', $page->getTitle());
292
293
        $resultQuery = $this->getToolsConnection()->prepare($sql);
294
        $resultQuery->bindParam(':dbName', $dbName);
295
        $resultQuery->bindParam(':title', $pageTitle);
296
        $resultQuery->execute();
297
298
        return $resultQuery->fetchAll();
299
    }
300
301
    /**
302
     * Get basic wikidata on the page: label and description.
303
     * @param Page $page
304
     * @return string[] In the format:
305
     *    [[
306
     *         'term' => string such as 'label',
307
     *         'term_text' => string (value for 'label'),
308
     *     ], ... ]
309
     */
310
    public function getWikidataInfo(Page $page)
311
    {
312
        if (empty($page->getWikidataId())) {
313
            return [];
314
        }
315
316
        $wikidataId = ltrim($page->getWikidataId(), 'Q');
317
        $lang = $page->getProject()->getLang();
318
319
        $sql = "SELECT IF(term_type = 'label', 'label', 'description') AS term, term_text
320
                FROM wikidatawiki_p.wb_entity_per_page
321
                JOIN wikidatawiki_p.page ON epp_page_id = page_id
322
                JOIN wikidatawiki_p.wb_terms ON term_entity_id = epp_entity_id
323
                    AND term_language = :lang
324
                    AND term_type IN ('label', 'description')
325
                WHERE epp_entity_id = :wikidataId
326
327
                UNION
328
329
                SELECT pl_title AS term, wb_terms.term_text
330
                FROM wikidatawiki_p.pagelinks
331
                JOIN wikidatawiki_p.wb_terms ON term_entity_id = SUBSTRING(pl_title, 2)
332
                    AND term_entity_type = (IF(SUBSTRING(pl_title, 1, 1) = 'Q', 'item', 'property'))
333
                    AND term_language = :lang
334
                    AND term_type = 'label'
335
                WHERE pl_namespace IN (0, 120)
336
                    AND pl_from = (
337
                        SELECT page_id FROM wikidatawiki_p.page
338
                        WHERE page_namespace = 0
339
                            AND page_title = 'Q:wikidataId'
340
                    )";
341
342
        $resultQuery = $this->getProjectsConnection()->prepare($sql);
343
        $resultQuery->bindParam(':lang', $lang);
344
        $resultQuery->bindParam(':wikidataId', $wikidataId);
345
        $resultQuery->execute();
346
347
        return $resultQuery->fetchAll();
348
    }
349
350
    /**
351
     * Get or count all wikidata items for the given page,
352
     *     not just languages of sister projects
353
     * @param Page $page
354
     * @param bool $count Set to true to get only a COUNT
355
     * @return string[]|int Records as returend by the DB,
356
     *                      or raw COUNT of the records.
357
     */
358
    public function getWikidataItems(Page $page, $count = false)
359
    {
360
        if (!$page->getWikidataId()) {
361
            return $count ? 0 : [];
362
        }
363
364
        $wikidataId = ltrim($page->getWikidataId(), 'Q');
365
366
        $sql = "SELECT " . ($count ? 'COUNT(*) AS count' : '*') . "
367
                FROM wikidatawiki_p.wb_items_per_site
368
                WHERE ips_item_id = :wikidataId";
369
370
        $resultQuery = $this->getProjectsConnection()->prepare($sql);
371
        $resultQuery->bindParam(':wikidataId', $wikidataId);
372
        $resultQuery->execute();
373
374
        $result = $resultQuery->fetchAll();
375
376
        return $count ? (int) $result[0]['count'] : $result;
377
    }
378
379
    /**
380
     * Get number of in and outgoing links and redirects to the given page.
381
     * @param Page $page
382
     * @return string[] Counts with the keys 'links_ext_count', 'links_out_count',
383
     *                  'links_in_count' and 'redirects_count'
384
     */
385
    public function countLinksAndRedirects(Page $page)
386
    {
387
        $externalLinksTable = $this->getTableName($page->getProject()->getDatabaseName(), 'externallinks');
388
        $pageLinksTable = $this->getTableName($page->getProject()->getDatabaseName(), 'pagelinks');
389
        $redirectTable = $this->getTableName($page->getProject()->getDatabaseName(), 'redirect');
390
391
        $sql = "SELECT COUNT(*) AS value, 'links_ext' AS type
392
                FROM $externalLinksTable WHERE el_from = :id
393
                UNION
394
                SELECT COUNT(*) AS value, 'links_out' AS type
395
                FROM $pageLinksTable WHERE pl_from = :id
396
                UNION
397
                SELECT COUNT(*) AS value, 'links_in' AS type
398
                FROM $pageLinksTable WHERE pl_namespace = :namespace AND pl_title = :title
399
                UNION
400
                SELECT COUNT(*) AS value, 'redirects' AS type
401
                FROM $redirectTable WHERE rd_namespace = :namespace AND rd_title = :title";
402
403
        $params = [
404
            'id' => $page->getId(),
405
            'title' => str_replace(' ', '_', $page->getTitleWithoutNamespace()),
406
            'namespace' => $page->getNamespace(),
407
        ];
408
409
        $conn = $this->getProjectsConnection();
410
        $res = $conn->executeQuery($sql, $params);
411
412
        $data = [];
413
414
        // Transform to associative array by 'type'
415
        foreach ($res as $row) {
416
            $data[$row['type'] . '_count'] = $row['value'];
417
        }
418
419
        return $data;
420
    }
421
422
    /**
423
     * Count wikidata items for the given page, not just languages of sister projects
424
     * @param Page $page
425
     * @return int Number of records.
426
     */
427
    public function countWikidataItems(Page $page)
428
    {
429
        return $this->getWikidataItems($page, true);
430
    }
431
432
    /**
433
     * Get page views for the given page and timeframe.
434
     * @param Page $page
435
     * @param string|DateTime $start In the format YYYYMMDD
436
     * @param string|DateTime $end In the format YYYYMMDD
437
     * @return string[]
438
     */
439
    public function getPageviews($page, $start, $end)
440
    {
441
        $title = rawurlencode(str_replace(' ', '_', $page->getTitle()));
442
        $client = new GuzzleHttp\Client();
443
444
        if ($start instanceof DateTime) {
0 ignored issues
show
Bug introduced by
The class Xtools\DateTime does not exist. Did you forget a USE statement, or did you not list all dependencies?

This error could be the result of:

1. Missing dependencies

PHP Analyzer uses your composer.json file (if available) to determine the dependencies of your project and to determine all the available classes and functions. It expects the composer.json to be in the root folder of your repository.

Are you sure this class is defined by one of your dependencies, or did you maybe not list a dependency in either the require or require-dev section?

2. Missing use statement

PHP does not complain about undefined classes in ìnstanceof checks. For example, the following PHP code will work perfectly fine:

if ($x instanceof DoesNotExist) {
    // Do something.
}

If you have not tested against this specific condition, such errors might go unnoticed.

Loading history...
445
            $start = $start->format('YYYYMMDD');
446
        }
447
        if ($end instanceof DateTime) {
0 ignored issues
show
Bug introduced by
The class Xtools\DateTime does not exist. Did you forget a USE statement, or did you not list all dependencies?

This error could be the result of:

1. Missing dependencies

PHP Analyzer uses your composer.json file (if available) to determine the dependencies of your project and to determine all the available classes and functions. It expects the composer.json to be in the root folder of your repository.

Are you sure this class is defined by one of your dependencies, or did you maybe not list a dependency in either the require or require-dev section?

2. Missing use statement

PHP does not complain about undefined classes in ìnstanceof checks. For example, the following PHP code will work perfectly fine:

if ($x instanceof DoesNotExist) {
    // Do something.
}

If you have not tested against this specific condition, such errors might go unnoticed.

Loading history...
448
            $end = $end->format('YYYYMMDD');
449
        }
450
451
        $project = $page->getProject()->getDomain();
452
453
        $url = 'https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/' .
454
            "$project/all-access/user/$title/daily/$start/$end";
455
456
        $res = $client->request('GET', $url);
0 ignored issues
show
Bug introduced by
The method request() does not exist on GuzzleHttp\Client. Did you maybe mean createRequest()?

This check marks calls to methods that do not seem to exist on an object.

This is most likely the result of a method being renamed without all references to it being renamed likewise.

Loading history...
457
        return json_decode($res->getBody()->getContents(), true);
458
    }
459
}
460