GenerateSitemap::generateNamespaces()   A
last analyzed

Complexity

Conditions 3
Paths 3

Size

Total Lines 23
Code Lines 12

Duplication

Lines 0
Ratio 0 %

Importance

Changes 0
Metric Value
cc 3
eloc 12
nc 3
nop 0
dl 0
loc 23
rs 9.0856
c 0
b 0
f 0
1
<?php
0 ignored issues
show
Coding Style Compatibility introduced by
For compatibility and reusability of your code, PSR1 recommends that a file should introduce either new symbols (like classes, functions, etc.) or have side-effects (like outputting something, or including other files), but not both at the same time. The first symbol is defined on line 36 and the first side effect is on line 29.

The PSR-1: Basic Coding Standard recommends that a file should either introduce new symbols, that is classes, functions, constants or similar, or have side effects. Side effects are anything that executes logic, like for example printing output, changing ini settings or writing to a file.

The idea behind this recommendation is that merely auto-loading a class should not change the state of an application. It also promotes a cleaner style of programming and makes your code less prone to errors, because the logic is not spread out all over the place.

To learn more about the PSR-1, please see the PHP-FIG site on the PSR-1.

Loading history...
2
/**
3
 * Creates a sitemap for the site.
4
 *
5
 * Copyright © 2005, Ævar Arnfjörð Bjarmason, Jens Frank <[email protected]> and
6
 * Brion Vibber <[email protected]>
7
 *
8
 * This program is free software; you can redistribute it and/or modify
9
 * it under the terms of the GNU General Public License as published by
10
 * the Free Software Foundation; either version 2 of the License, or
11
 * (at your option) any later version.
12
 *
13
 * This program is distributed in the hope that it will be useful,
14
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
15
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
16
 * GNU General Public License for more details.
17
 *
18
 * You should have received a copy of the GNU General Public License along
19
 * with this program; if not, write to the Free Software Foundation, Inc.,
20
 * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
21
 * http://www.gnu.org/copyleft/gpl.html
22
 *
23
 * @file
24
 * @ingroup Maintenance
25
 * @see http://www.sitemaps.org/
26
 * @see http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd
27
 */
28
29
require_once __DIR__ . '/Maintenance.php';
30
31
/**
32
 * Maintenance script that generates a sitemap for the site.
33
 *
34
 * @ingroup Maintenance
35
 */
36
class GenerateSitemap extends Maintenance {
37
	const GS_MAIN = -2;
38
	const GS_TALK = -1;
39
40
	/**
41
	 * The maximum amount of urls in a sitemap file
42
	 *
43
	 * @link http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd
44
	 *
45
	 * @var int
46
	 */
47
	public $url_limit;
48
49
	/**
50
	 * The maximum size of a sitemap file
51
	 *
52
	 * @link http://www.sitemaps.org/faq.php#faq_sitemap_size
53
	 *
54
	 * @var int
55
	 */
56
	public $size_limit;
57
58
	/**
59
	 * The path to prepend to the filename
60
	 *
61
	 * @var string
62
	 */
63
	public $fspath;
64
65
	/**
66
	 * The URL path to prepend to filenames in the index;
67
	 * should resolve to the same directory as $fspath.
68
	 *
69
	 * @var string
70
	 */
71
	public $urlpath;
72
73
	/**
74
	 * Whether or not to use compression
75
	 *
76
	 * @var bool
77
	 */
78
	public $compress;
79
80
	/**
81
	 * Whether or not to include redirection pages
82
	 *
83
	 * @var bool
84
	 */
85
	public $skipRedirects;
86
87
	/**
88
	 * The number of entries to save in each sitemap file
89
	 *
90
	 * @var array
91
	 */
92
	public $limit = [];
93
94
	/**
95
	 * Key => value entries of namespaces and their priorities
96
	 *
97
	 * @var array
98
	 */
99
	public $priorities = [];
100
101
	/**
102
	 * A one-dimensional array of namespaces in the wiki
103
	 *
104
	 * @var array
105
	 */
106
	public $namespaces = [];
107
108
	/**
109
	 * When this sitemap batch was generated
110
	 *
111
	 * @var string
112
	 */
113
	public $timestamp;
114
115
	/**
116
	 * A database replica DB object
117
	 *
118
	 * @var object
119
	 */
120
	public $dbr;
121
122
	/**
123
	 * A resource pointing to the sitemap index file
124
	 *
125
	 * @var resource
126
	 */
127
	public $findex;
128
129
	/**
130
	 * A resource pointing to a sitemap file
131
	 *
132
	 * @var resource
133
	 */
134
	public $file;
135
136
	/**
137
	 * Identifier to use in filenames, default $wgDBname
138
	 *
139
	 * @var string
140
	 */
141
	private $identifier;
142
143
	/**
144
	 * Constructor
145
	 */
146
	public function __construct() {
147
		parent::__construct();
148
		$this->addDescription( 'Creates a sitemap for the site' );
149
		$this->addOption(
150
			'fspath',
151
			'The file system path to save to, e.g. /tmp/sitemap; defaults to current directory',
152
			false,
153
			true
154
		);
155
		$this->addOption(
156
			'urlpath',
157
			'The URL path corresponding to --fspath, prepended to filenames in the index; '
158
				. 'defaults to an empty string',
159
			false,
160
			true
161
		);
162
		$this->addOption(
163
			'compress',
164
			'Compress the sitemap files, can take value yes|no, default yes',
165
			false,
166
			true
167
		);
168
		$this->addOption( 'skip-redirects', 'Do not include redirecting articles in the sitemap' );
169
		$this->addOption(
170
			'identifier',
171
			'What site identifier to use for the wiki, defaults to $wgDBname',
172
			false,
173
			true
174
		);
175
	}
176
177
	/**
178
	 * Execute
179
	 */
180
	public function execute() {
181
		$this->setNamespacePriorities();
182
		$this->url_limit = 50000;
183
		$this->size_limit = pow( 2, 20 ) * 10;
0 ignored issues
show
Documentation Bug introduced by
It seems like pow(2, 20) * 10 can also be of type double. However, the property $size_limit is declared as type integer. Maybe add an additional type check?

Our type inference engine has found a suspicous assignment of a value to a property. This check raises an issue when a value that can be of a mixed type is assigned to a property that is type hinted more strictly.

For example, imagine you have a variable $accountId that can either hold an Id object or false (if there is no account id yet). Your code now assigns that value to the id property of an instance of the Account class. This class holds a proper account, so the id value must no longer be false.

Either this assignment is in error or a type check should be added for that assignment.

class Id
{
    public $id;

    public function __construct($id)
    {
        $this->id = $id;
    }

}

class Account
{
    /** @var  Id $id */
    public $id;
}

$account_id = false;

if (starsAreRight()) {
    $account_id = new Id(42);
}

$account = new Account();
if ($account instanceof Id)
{
    $account->id = $account_id;
}
Loading history...
184
185
		# Create directory if needed
186
		$fspath = $this->getOption( 'fspath', getcwd() );
187
		if ( !wfMkdirParents( $fspath, null, __METHOD__ ) ) {
188
			$this->error( "Can not create directory $fspath.", 1 );
189
		}
190
191
		$this->fspath = realpath( $fspath ) . DIRECTORY_SEPARATOR;
192
		$this->urlpath = $this->getOption( 'urlpath', "" );
193
		if ( $this->urlpath !== "" && substr( $this->urlpath, -1 ) !== '/' ) {
194
			$this->urlpath .= '/';
195
		}
196
		$this->identifier = $this->getOption( 'identifier', wfWikiID() );
197
		$this->compress = $this->getOption( 'compress', 'yes' ) !== 'no';
198
		$this->skipRedirects = $this->getOption( 'skip-redirects', false ) !== false;
199
		$this->dbr = $this->getDB( DB_REPLICA );
200
		$this->generateNamespaces();
201
		$this->timestamp = wfTimestamp( TS_ISO_8601, wfTimestampNow() );
0 ignored issues
show
Documentation Bug introduced by
It seems like wfTimestamp(TS_ISO_8601, wfTimestampNow()) can also be of type false. However, the property $timestamp is declared as type string. Maybe add an additional type check?

Our type inference engine has found a suspicous assignment of a value to a property. This check raises an issue when a value that can be of a mixed type is assigned to a property that is type hinted more strictly.

For example, imagine you have a variable $accountId that can either hold an Id object or false (if there is no account id yet). Your code now assigns that value to the id property of an instance of the Account class. This class holds a proper account, so the id value must no longer be false.

Either this assignment is in error or a type check should be added for that assignment.

class Id
{
    public $id;

    public function __construct($id)
    {
        $this->id = $id;
    }

}

class Account
{
    /** @var  Id $id */
    public $id;
}

$account_id = false;

if (starsAreRight()) {
    $account_id = new Id(42);
}

$account = new Account();
if ($account instanceof Id)
{
    $account->id = $account_id;
}
Loading history...
202
		$this->findex = fopen( "{$this->fspath}sitemap-index-{$this->identifier}.xml", 'wb' );
203
		$this->main();
204
	}
205
206
	private function setNamespacePriorities() {
207
		global $wgSitemapNamespacesPriorities;
208
209
		// Custom main namespaces
210
		$this->priorities[self::GS_MAIN] = '0.5';
211
		// Custom talk namesspaces
212
		$this->priorities[self::GS_TALK] = '0.1';
213
		// MediaWiki standard namespaces
214
		$this->priorities[NS_MAIN] = '1.0';
215
		$this->priorities[NS_TALK] = '0.1';
216
		$this->priorities[NS_USER] = '0.5';
217
		$this->priorities[NS_USER_TALK] = '0.1';
218
		$this->priorities[NS_PROJECT] = '0.5';
219
		$this->priorities[NS_PROJECT_TALK] = '0.1';
220
		$this->priorities[NS_FILE] = '0.5';
221
		$this->priorities[NS_FILE_TALK] = '0.1';
222
		$this->priorities[NS_MEDIAWIKI] = '0.0';
223
		$this->priorities[NS_MEDIAWIKI_TALK] = '0.1';
224
		$this->priorities[NS_TEMPLATE] = '0.0';
225
		$this->priorities[NS_TEMPLATE_TALK] = '0.1';
226
		$this->priorities[NS_HELP] = '0.5';
227
		$this->priorities[NS_HELP_TALK] = '0.1';
228
		$this->priorities[NS_CATEGORY] = '0.5';
229
		$this->priorities[NS_CATEGORY_TALK] = '0.1';
230
231
		// Custom priorities
232
		if ( $wgSitemapNamespacesPriorities !== false ) {
233
			/**
234
			 * @var $wgSitemapNamespacesPriorities array
235
			 */
236
			foreach ( $wgSitemapNamespacesPriorities as $namespace => $priority ) {
237
				$float = floatval( $priority );
238
				if ( $float > 1.0 ) {
239
					$priority = '1.0';
240
				} elseif ( $float < 0.0 ) {
241
					$priority = '0.0';
242
				}
243
				$this->priorities[$namespace] = $priority;
244
			}
245
		}
246
	}
247
248
	/**
249
	 * Generate a one-dimensional array of existing namespaces
250
	 */
251
	function generateNamespaces() {
252
		// Only generate for specific namespaces if $wgSitemapNamespaces is an array.
253
		global $wgSitemapNamespaces;
254
		if ( is_array( $wgSitemapNamespaces ) ) {
255
			$this->namespaces = $wgSitemapNamespaces;
256
257
			return;
258
		}
259
260
		$res = $this->dbr->select( 'page',
261
			[ 'page_namespace' ],
262
			[],
263
			__METHOD__,
264
			[
265
				'GROUP BY' => 'page_namespace',
266
				'ORDER BY' => 'page_namespace',
267
			]
268
		);
269
270
		foreach ( $res as $row ) {
271
			$this->namespaces[] = $row->page_namespace;
272
		}
273
	}
274
275
	/**
276
	 * Get the priority of a given namespace
277
	 *
278
	 * @param int $namespace The namespace to get the priority for
279
	 * @return string
280
	 */
281
	function priority( $namespace ) {
282
		return isset( $this->priorities[$namespace] )
283
			? $this->priorities[$namespace]
284
			: $this->guessPriority( $namespace );
285
	}
286
287
	/**
288
	 * If the namespace isn't listed on the priority list return the
289
	 * default priority for the namespace, varies depending on whether it's
290
	 * a talkpage or not.
291
	 *
292
	 * @param int $namespace The namespace to get the priority for
293
	 * @return string
294
	 */
295
	function guessPriority( $namespace ) {
296
		return MWNamespace::isSubject( $namespace )
297
			? $this->priorities[self::GS_MAIN]
298
			: $this->priorities[self::GS_TALK];
299
	}
300
301
	/**
302
	 * Return a database resolution of all the pages in a given namespace
303
	 *
304
	 * @param int $namespace Limit the query to this namespace
305
	 * @return Resource
306
	 */
307
	function getPageRes( $namespace ) {
308
		return $this->dbr->select( 'page',
309
			[
310
				'page_namespace',
311
				'page_title',
312
				'page_touched',
313
				'page_is_redirect'
314
			],
315
			[ 'page_namespace' => $namespace ],
316
			__METHOD__
317
		);
318
	}
319
320
	/**
321
	 * Main loop
322
	 */
323
	public function main() {
324
		global $wgContLang;
325
326
		fwrite( $this->findex, $this->openIndex() );
327
328
		foreach ( $this->namespaces as $namespace ) {
329
			$res = $this->getPageRes( $namespace );
330
			$this->file = false;
0 ignored issues
show
Documentation Bug introduced by
It seems like false of type false is incompatible with the declared type resource of property $file.

Our type inference engine has found an assignment to a property that is incompatible with the declared type of that property.

Either this assignment is in error or the assigned type should be added to the documentation/type hint for that property..

Loading history...
331
			$this->generateLimit( $namespace );
332
			$length = $this->limit[0];
333
			$i = $smcount = 0;
334
335
			$fns = $wgContLang->getFormattedNsText( $namespace );
336
			$this->output( "$namespace ($fns)\n" );
337
			$skippedRedirects = 0; // Number of redirects skipped for that namespace
338
			foreach ( $res as $row ) {
0 ignored issues
show
Bug introduced by
The expression $res of type resource is not traversable.
Loading history...
339
				if ( $this->skipRedirects && $row->page_is_redirect ) {
340
					$skippedRedirects++;
341
					continue;
342
				}
343
344
				if ( $i++ === 0
345
					|| $i === $this->url_limit + 1
346
					|| $length + $this->limit[1] + $this->limit[2] > $this->size_limit
347
				) {
348
					if ( $this->file !== false ) {
349
						$this->write( $this->file, $this->closeFile() );
350
						$this->close( $this->file );
351
					}
352
					$filename = $this->sitemapFilename( $namespace, $smcount++ );
353
					$this->file = $this->open( $this->fspath . $filename, 'wb' );
354
					$this->write( $this->file, $this->openFile() );
355
					fwrite( $this->findex, $this->indexEntry( $filename ) );
356
					$this->output( "\t$this->fspath$filename\n" );
357
					$length = $this->limit[0];
358
					$i = 1;
359
				}
360
				$title = Title::makeTitle( $row->page_namespace, $row->page_title );
361
				$date = wfTimestamp( TS_ISO_8601, $row->page_touched );
362
				$entry = $this->fileEntry( $title->getCanonicalURL(), $date, $this->priority( $namespace ) );
0 ignored issues
show
Security Bug introduced by
It seems like $title->getCanonicalURL() targeting Title::getCanonicalURL() can also be of type false; however, GenerateSitemap::fileEntry() does only seem to accept string, did you maybe forget to handle an error condition?
Loading history...
Security Bug introduced by
It seems like $date defined by wfTimestamp(TS_ISO_8601, $row->page_touched) on line 361 can also be of type false; however, GenerateSitemap::fileEntry() does only seem to accept string, did you maybe forget to handle an error condition?

This check looks for type mismatches where the missing type is false. This is usually indicative of an error condtion.

Consider the follow example

<?php

function getDate($date)
{
    if ($date !== null) {
        return new DateTime($date);
    }

    return false;
}

This function either returns a new DateTime object or false, if there was an error. This is a typical pattern in PHP programming to show that an error has occurred without raising an exception. The calling code should check for this returned false before passing on the value to another function or method that may not be able to handle a false.

Loading history...
363
				$length += strlen( $entry );
364
				$this->write( $this->file, $entry );
0 ignored issues
show
Security Bug introduced by
It seems like $this->file can also be of type false; however, GenerateSitemap::write() does only seem to accept resource, did you maybe forget to handle an error condition?
Loading history...
365
				// generate pages for language variants
366
				if ( $wgContLang->hasVariants() ) {
367
					$variants = $wgContLang->getVariants();
368
					foreach ( $variants as $vCode ) {
369
						if ( $vCode == $wgContLang->getCode() ) {
370
							continue; // we don't want default variant
371
						}
372
						$entry = $this->fileEntry(
373
							$title->getCanonicalURL( '', $vCode ),
0 ignored issues
show
Security Bug introduced by
It seems like $title->getCanonicalURL('', $vCode) targeting Title::getCanonicalURL() can also be of type false; however, GenerateSitemap::fileEntry() does only seem to accept string, did you maybe forget to handle an error condition?
Loading history...
374
							$date,
0 ignored issues
show
Security Bug introduced by
It seems like $date defined by wfTimestamp(TS_ISO_8601, $row->page_touched) on line 361 can also be of type false; however, GenerateSitemap::fileEntry() does only seem to accept string, did you maybe forget to handle an error condition?

This check looks for type mismatches where the missing type is false. This is usually indicative of an error condtion.

Consider the follow example

<?php

function getDate($date)
{
    if ($date !== null) {
        return new DateTime($date);
    }

    return false;
}

This function either returns a new DateTime object or false, if there was an error. This is a typical pattern in PHP programming to show that an error has occurred without raising an exception. The calling code should check for this returned false before passing on the value to another function or method that may not be able to handle a false.

Loading history...
375
							$this->priority( $namespace )
376
						);
377
						$length += strlen( $entry );
378
						$this->write( $this->file, $entry );
379
					}
380
				}
381
			}
382
383
			if ( $this->skipRedirects && $skippedRedirects > 0 ) {
384
				$this->output( "  skipped $skippedRedirects redirect(s)\n" );
385
			}
386
387
			if ( $this->file ) {
388
				$this->write( $this->file, $this->closeFile() );
389
				$this->close( $this->file );
390
			}
391
		}
392
		fwrite( $this->findex, $this->closeIndex() );
393
		fclose( $this->findex );
394
	}
395
396
	/**
397
	 * gzopen() / fopen() wrapper
398
	 *
399
	 * @param string $file
400
	 * @param string $flags
401
	 * @return resource
402
	 */
403
	function open( $file, $flags ) {
404
		$resource = $this->compress ? gzopen( $file, $flags ) : fopen( $file, $flags );
405
		if ( $resource === false ) {
406
			throw new MWException( __METHOD__
407
				. " error opening file $file with flags $flags. Check permissions?" );
408
		}
409
410
		return $resource;
411
	}
412
413
	/**
414
	 * gzwrite() / fwrite() wrapper
415
	 *
416
	 * @param resource $handle
417
	 * @param string $str
418
	 */
419
	function write( &$handle, $str ) {
420
		if ( $handle === true || $handle === false ) {
421
			throw new MWException( __METHOD__ . " was passed a boolean as a file handle.\n" );
422
		}
423
		if ( $this->compress ) {
424
			gzwrite( $handle, $str );
425
		} else {
426
			fwrite( $handle, $str );
427
		}
428
	}
429
430
	/**
431
	 * gzclose() / fclose() wrapper
432
	 *
433
	 * @param resource $handle
434
	 */
435
	function close( &$handle ) {
436
		if ( $this->compress ) {
437
			gzclose( $handle );
438
		} else {
439
			fclose( $handle );
440
		}
441
	}
442
443
	/**
444
	 * Get a sitemap filename
445
	 *
446
	 * @param int $namespace The namespace
447
	 * @param int $count The count
448
	 * @return string
449
	 */
450
	function sitemapFilename( $namespace, $count ) {
451
		$ext = $this->compress ? '.gz' : '';
452
453
		return "sitemap-{$this->identifier}-NS_$namespace-$count.xml$ext";
454
	}
455
456
	/**
457
	 * Return the XML required to open an XML file
458
	 *
459
	 * @return string
460
	 */
461
	function xmlHead() {
462
		return '<?xml version="1.0" encoding="UTF-8"?>' . "\n";
463
	}
464
465
	/**
466
	 * Return the XML schema being used
467
	 *
468
	 * @return string
469
	 */
470
	function xmlSchema() {
471
		return 'http://www.sitemaps.org/schemas/sitemap/0.9';
472
	}
473
474
	/**
475
	 * Return the XML required to open a sitemap index file
476
	 *
477
	 * @return string
478
	 */
479
	function openIndex() {
480
		return $this->xmlHead() . '<sitemapindex xmlns="' . $this->xmlSchema() . '">' . "\n";
481
	}
482
483
	/**
484
	 * Return the XML for a single sitemap indexfile entry
485
	 *
486
	 * @param string $filename The filename of the sitemap file
487
	 * @return string
488
	 */
489
	function indexEntry( $filename ) {
490
		return
491
			"\t<sitemap>\n" .
492
			"\t\t<loc>{$this->urlpath}$filename</loc>\n" .
493
			"\t\t<lastmod>{$this->timestamp}</lastmod>\n" .
494
			"\t</sitemap>\n";
495
	}
496
497
	/**
498
	 * Return the XML required to close a sitemap index file
499
	 *
500
	 * @return string
501
	 */
502
	function closeIndex() {
503
		return "</sitemapindex>\n";
504
	}
505
506
	/**
507
	 * Return the XML required to open a sitemap file
508
	 *
509
	 * @return string
510
	 */
511
	function openFile() {
512
		return $this->xmlHead() . '<urlset xmlns="' . $this->xmlSchema() . '">' . "\n";
513
	}
514
515
	/**
516
	 * Return the XML for a single sitemap entry
517
	 *
518
	 * @param string $url An RFC 2396 compliant URL
519
	 * @param string $date A ISO 8601 date
520
	 * @param string $priority A priority indicator, 0.0 - 1.0 inclusive with a 0.1 stepsize
521
	 * @return string
522
	 */
523
	function fileEntry( $url, $date, $priority ) {
524
		return
525
			"\t<url>\n" .
526
			// bug 34666: $url may contain bad characters such as ampersands.
527
			"\t\t<loc>" . htmlspecialchars( $url ) . "</loc>\n" .
528
			"\t\t<lastmod>$date</lastmod>\n" .
529
			"\t\t<priority>$priority</priority>\n" .
530
			"\t</url>\n";
531
	}
532
533
	/**
534
	 * Return the XML required to close sitemap file
535
	 *
536
	 * @return string
537
	 */
538
	function closeFile() {
539
		return "</urlset>\n";
540
	}
541
542
	/**
543
	 * Populate $this->limit
544
	 *
545
	 * @param int $namespace
546
	 */
547
	function generateLimit( $namespace ) {
548
		// bug 17961: make a title with the longest possible URL in this namespace
549
		$title = Title::makeTitle( $namespace, str_repeat( "\xf0\xa8\xae\x81", 63 ) . "\xe5\x96\x83" );
550
551
		$this->limit = [
552
			strlen( $this->openFile() ),
553
			strlen( $this->fileEntry(
554
				$title->getCanonicalURL(),
0 ignored issues
show
Security Bug introduced by
It seems like $title->getCanonicalURL() targeting Title::getCanonicalURL() can also be of type false; however, GenerateSitemap::fileEntry() does only seem to accept string, did you maybe forget to handle an error condition?
Loading history...
555
				wfTimestamp( TS_ISO_8601, wfTimestamp() ),
0 ignored issues
show
Security Bug introduced by
It seems like wfTimestamp(TS_ISO_8601, wfTimestamp()) targeting wfTimestamp() can also be of type false; however, GenerateSitemap::fileEntry() does only seem to accept string, did you maybe forget to handle an error condition?
Loading history...
556
				$this->priority( $namespace )
557
			) ),
558
			strlen( $this->closeFile() )
559
		];
560
	}
561
}
562
563
$maintClass = "GenerateSitemap";
564
require_once RUN_MAINTENANCE_IF_MAIN;
565