Issues (4122)

Security Analysis    not enabled

This project does not seem to handle request data directly as such no vulnerable execution paths were found.

  Cross-Site Scripting
Cross-Site Scripting enables an attacker to inject code into the response of a web-request that is viewed by other users. It can for example be used to bypass access controls, or even to take over other users' accounts.
  File Exposure
File Exposure allows an attacker to gain access to local files that he should not be able to access. These files can for example include database credentials, or other configuration files.
  File Manipulation
File Manipulation enables an attacker to write custom data to files. This potentially leads to injection of arbitrary code on the server.
  Object Injection
Object Injection enables an attacker to inject an object into PHP code, and can lead to arbitrary code execution, file exposure, or file manipulation attacks.
  Code Injection
Code Injection enables an attacker to execute arbitrary code on the server.
  Response Splitting
Response Splitting can be used to send arbitrary responses.
  File Inclusion
File Inclusion enables an attacker to inject custom files into PHP's file loading mechanism, either explicitly passed to include, or for example via PHP's auto-loading mechanism.
  Command Injection
Command Injection enables an attacker to inject a shell command that is execute with the privileges of the web-server. This can be used to expose sensitive data, or gain access of your server.
  SQL Injection
SQL Injection enables an attacker to execute arbitrary SQL code on your database server gaining access to user data, or manipulating user data.
  XPath Injection
XPath Injection enables an attacker to modify the parts of XML document that are read. If that XML document is for example used for authentication, this can lead to further vulnerabilities similar to SQL Injection.
  LDAP Injection
LDAP Injection enables an attacker to inject LDAP statements potentially granting permission to run unauthorized queries, or modify content inside the LDAP tree.
  Header Injection
  Other Vulnerability
This category comprises other attack vectors such as manipulating the PHP runtime, loading custom extensions, freezing the runtime, or similar.
  Regex Injection
Regex Injection enables an attacker to execute arbitrary code in your PHP process.
  XML Injection
XML Injection enables an attacker to read files on your local filesystem including configuration files, or can be abused to freeze your web-server process.
  Variable Injection
Variable Injection enables an attacker to overwrite program variables with custom data, and can lead to further vulnerabilities.
Unfortunately, the security analysis is currently not available for your project. If you are a non-commercial open-source project, please contact support to gain access.

maintenance/generateSitemap.php (1 issue)

Upgrade to new PHP Analysis Engine

These results are based on our legacy PHP analysis, consider migrating to our new PHP analysis engine instead. Learn more

1
<?php
0 ignored issues
show
Coding Style Compatibility introduced by
For compatibility and reusability of your code, PSR1 recommends that a file should introduce either new symbols (like classes, functions, etc.) or have side-effects (like outputting something, or including other files), but not both at the same time. The first symbol is defined on line 36 and the first side effect is on line 29.

The PSR-1: Basic Coding Standard recommends that a file should either introduce new symbols, that is classes, functions, constants or similar, or have side effects. Side effects are anything that executes logic, like for example printing output, changing ini settings or writing to a file.

The idea behind this recommendation is that merely auto-loading a class should not change the state of an application. It also promotes a cleaner style of programming and makes your code less prone to errors, because the logic is not spread out all over the place.

To learn more about the PSR-1, please see the PHP-FIG site on the PSR-1.

Loading history...
2
/**
3
 * Creates a sitemap for the site.
4
 *
5
 * Copyright © 2005, Ævar Arnfjörð Bjarmason, Jens Frank <[email protected]> and
6
 * Brion Vibber <[email protected]>
7
 *
8
 * This program is free software; you can redistribute it and/or modify
9
 * it under the terms of the GNU General Public License as published by
10
 * the Free Software Foundation; either version 2 of the License, or
11
 * (at your option) any later version.
12
 *
13
 * This program is distributed in the hope that it will be useful,
14
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
15
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
16
 * GNU General Public License for more details.
17
 *
18
 * You should have received a copy of the GNU General Public License along
19
 * with this program; if not, write to the Free Software Foundation, Inc.,
20
 * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
21
 * http://www.gnu.org/copyleft/gpl.html
22
 *
23
 * @file
24
 * @ingroup Maintenance
25
 * @see http://www.sitemaps.org/
26
 * @see http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd
27
 */
28
29
require_once __DIR__ . '/Maintenance.php';
30
31
/**
32
 * Maintenance script that generates a sitemap for the site.
33
 *
34
 * @ingroup Maintenance
35
 */
36
class GenerateSitemap extends Maintenance {
37
	const GS_MAIN = -2;
38
	const GS_TALK = -1;
39
40
	/**
41
	 * The maximum amount of urls in a sitemap file
42
	 *
43
	 * @link http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd
44
	 *
45
	 * @var int
46
	 */
47
	public $url_limit;
48
49
	/**
50
	 * The maximum size of a sitemap file
51
	 *
52
	 * @link http://www.sitemaps.org/faq.php#faq_sitemap_size
53
	 *
54
	 * @var int
55
	 */
56
	public $size_limit;
57
58
	/**
59
	 * The path to prepend to the filename
60
	 *
61
	 * @var string
62
	 */
63
	public $fspath;
64
65
	/**
66
	 * The URL path to prepend to filenames in the index;
67
	 * should resolve to the same directory as $fspath.
68
	 *
69
	 * @var string
70
	 */
71
	public $urlpath;
72
73
	/**
74
	 * Whether or not to use compression
75
	 *
76
	 * @var bool
77
	 */
78
	public $compress;
79
80
	/**
81
	 * Whether or not to include redirection pages
82
	 *
83
	 * @var bool
84
	 */
85
	public $skipRedirects;
86
87
	/**
88
	 * The number of entries to save in each sitemap file
89
	 *
90
	 * @var array
91
	 */
92
	public $limit = [];
93
94
	/**
95
	 * Key => value entries of namespaces and their priorities
96
	 *
97
	 * @var array
98
	 */
99
	public $priorities = [];
100
101
	/**
102
	 * A one-dimensional array of namespaces in the wiki
103
	 *
104
	 * @var array
105
	 */
106
	public $namespaces = [];
107
108
	/**
109
	 * When this sitemap batch was generated
110
	 *
111
	 * @var string
112
	 */
113
	public $timestamp;
114
115
	/**
116
	 * A database replica DB object
117
	 *
118
	 * @var object
119
	 */
120
	public $dbr;
121
122
	/**
123
	 * A resource pointing to the sitemap index file
124
	 *
125
	 * @var resource
126
	 */
127
	public $findex;
128
129
	/**
130
	 * A resource pointing to a sitemap file
131
	 *
132
	 * @var resource
133
	 */
134
	public $file;
135
136
	/**
137
	 * Identifier to use in filenames, default $wgDBname
138
	 *
139
	 * @var string
140
	 */
141
	private $identifier;
142
143
	/**
144
	 * Constructor
145
	 */
146
	public function __construct() {
147
		parent::__construct();
148
		$this->addDescription( 'Creates a sitemap for the site' );
149
		$this->addOption(
150
			'fspath',
151
			'The file system path to save to, e.g. /tmp/sitemap; defaults to current directory',
152
			false,
153
			true
154
		);
155
		$this->addOption(
156
			'urlpath',
157
			'The URL path corresponding to --fspath, prepended to filenames in the index; '
158
				. 'defaults to an empty string',
159
			false,
160
			true
161
		);
162
		$this->addOption(
163
			'compress',
164
			'Compress the sitemap files, can take value yes|no, default yes',
165
			false,
166
			true
167
		);
168
		$this->addOption( 'skip-redirects', 'Do not include redirecting articles in the sitemap' );
169
		$this->addOption(
170
			'identifier',
171
			'What site identifier to use for the wiki, defaults to $wgDBname',
172
			false,
173
			true
174
		);
175
	}
176
177
	/**
178
	 * Execute
179
	 */
180
	public function execute() {
181
		$this->setNamespacePriorities();
182
		$this->url_limit = 50000;
183
		$this->size_limit = pow( 2, 20 ) * 10;
184
185
		# Create directory if needed
186
		$fspath = $this->getOption( 'fspath', getcwd() );
187
		if ( !wfMkdirParents( $fspath, null, __METHOD__ ) ) {
188
			$this->error( "Can not create directory $fspath.", 1 );
189
		}
190
191
		$this->fspath = realpath( $fspath ) . DIRECTORY_SEPARATOR;
192
		$this->urlpath = $this->getOption( 'urlpath', "" );
193
		if ( $this->urlpath !== "" && substr( $this->urlpath, -1 ) !== '/' ) {
194
			$this->urlpath .= '/';
195
		}
196
		$this->identifier = $this->getOption( 'identifier', wfWikiID() );
197
		$this->compress = $this->getOption( 'compress', 'yes' ) !== 'no';
198
		$this->skipRedirects = $this->getOption( 'skip-redirects', false ) !== false;
199
		$this->dbr = $this->getDB( DB_REPLICA );
200
		$this->generateNamespaces();
201
		$this->timestamp = wfTimestamp( TS_ISO_8601, wfTimestampNow() );
202
		$this->findex = fopen( "{$this->fspath}sitemap-index-{$this->identifier}.xml", 'wb' );
203
		$this->main();
204
	}
205
206
	private function setNamespacePriorities() {
207
		global $wgSitemapNamespacesPriorities;
208
209
		// Custom main namespaces
210
		$this->priorities[self::GS_MAIN] = '0.5';
211
		// Custom talk namesspaces
212
		$this->priorities[self::GS_TALK] = '0.1';
213
		// MediaWiki standard namespaces
214
		$this->priorities[NS_MAIN] = '1.0';
215
		$this->priorities[NS_TALK] = '0.1';
216
		$this->priorities[NS_USER] = '0.5';
217
		$this->priorities[NS_USER_TALK] = '0.1';
218
		$this->priorities[NS_PROJECT] = '0.5';
219
		$this->priorities[NS_PROJECT_TALK] = '0.1';
220
		$this->priorities[NS_FILE] = '0.5';
221
		$this->priorities[NS_FILE_TALK] = '0.1';
222
		$this->priorities[NS_MEDIAWIKI] = '0.0';
223
		$this->priorities[NS_MEDIAWIKI_TALK] = '0.1';
224
		$this->priorities[NS_TEMPLATE] = '0.0';
225
		$this->priorities[NS_TEMPLATE_TALK] = '0.1';
226
		$this->priorities[NS_HELP] = '0.5';
227
		$this->priorities[NS_HELP_TALK] = '0.1';
228
		$this->priorities[NS_CATEGORY] = '0.5';
229
		$this->priorities[NS_CATEGORY_TALK] = '0.1';
230
231
		// Custom priorities
232
		if ( $wgSitemapNamespacesPriorities !== false ) {
233
			/**
234
			 * @var $wgSitemapNamespacesPriorities array
235
			 */
236
			foreach ( $wgSitemapNamespacesPriorities as $namespace => $priority ) {
237
				$float = floatval( $priority );
238
				if ( $float > 1.0 ) {
239
					$priority = '1.0';
240
				} elseif ( $float < 0.0 ) {
241
					$priority = '0.0';
242
				}
243
				$this->priorities[$namespace] = $priority;
244
			}
245
		}
246
	}
247
248
	/**
249
	 * Generate a one-dimensional array of existing namespaces
250
	 */
251
	function generateNamespaces() {
252
		// Only generate for specific namespaces if $wgSitemapNamespaces is an array.
253
		global $wgSitemapNamespaces;
254
		if ( is_array( $wgSitemapNamespaces ) ) {
255
			$this->namespaces = $wgSitemapNamespaces;
256
257
			return;
258
		}
259
260
		$res = $this->dbr->select( 'page',
261
			[ 'page_namespace' ],
262
			[],
263
			__METHOD__,
264
			[
265
				'GROUP BY' => 'page_namespace',
266
				'ORDER BY' => 'page_namespace',
267
			]
268
		);
269
270
		foreach ( $res as $row ) {
271
			$this->namespaces[] = $row->page_namespace;
272
		}
273
	}
274
275
	/**
276
	 * Get the priority of a given namespace
277
	 *
278
	 * @param int $namespace The namespace to get the priority for
279
	 * @return string
280
	 */
281
	function priority( $namespace ) {
282
		return isset( $this->priorities[$namespace] )
283
			? $this->priorities[$namespace]
284
			: $this->guessPriority( $namespace );
285
	}
286
287
	/**
288
	 * If the namespace isn't listed on the priority list return the
289
	 * default priority for the namespace, varies depending on whether it's
290
	 * a talkpage or not.
291
	 *
292
	 * @param int $namespace The namespace to get the priority for
293
	 * @return string
294
	 */
295
	function guessPriority( $namespace ) {
296
		return MWNamespace::isSubject( $namespace )
297
			? $this->priorities[self::GS_MAIN]
298
			: $this->priorities[self::GS_TALK];
299
	}
300
301
	/**
302
	 * Return a database resolution of all the pages in a given namespace
303
	 *
304
	 * @param int $namespace Limit the query to this namespace
305
	 * @return Resource
306
	 */
307
	function getPageRes( $namespace ) {
308
		return $this->dbr->select( 'page',
309
			[
310
				'page_namespace',
311
				'page_title',
312
				'page_touched',
313
				'page_is_redirect'
314
			],
315
			[ 'page_namespace' => $namespace ],
316
			__METHOD__
317
		);
318
	}
319
320
	/**
321
	 * Main loop
322
	 */
323
	public function main() {
324
		global $wgContLang;
325
326
		fwrite( $this->findex, $this->openIndex() );
327
328
		foreach ( $this->namespaces as $namespace ) {
329
			$res = $this->getPageRes( $namespace );
330
			$this->file = false;
331
			$this->generateLimit( $namespace );
332
			$length = $this->limit[0];
333
			$i = $smcount = 0;
334
335
			$fns = $wgContLang->getFormattedNsText( $namespace );
336
			$this->output( "$namespace ($fns)\n" );
337
			$skippedRedirects = 0; // Number of redirects skipped for that namespace
338
			foreach ( $res as $row ) {
339
				if ( $this->skipRedirects && $row->page_is_redirect ) {
340
					$skippedRedirects++;
341
					continue;
342
				}
343
344
				if ( $i++ === 0
345
					|| $i === $this->url_limit + 1
346
					|| $length + $this->limit[1] + $this->limit[2] > $this->size_limit
347
				) {
348
					if ( $this->file !== false ) {
349
						$this->write( $this->file, $this->closeFile() );
350
						$this->close( $this->file );
351
					}
352
					$filename = $this->sitemapFilename( $namespace, $smcount++ );
353
					$this->file = $this->open( $this->fspath . $filename, 'wb' );
354
					$this->write( $this->file, $this->openFile() );
355
					fwrite( $this->findex, $this->indexEntry( $filename ) );
356
					$this->output( "\t$this->fspath$filename\n" );
357
					$length = $this->limit[0];
358
					$i = 1;
359
				}
360
				$title = Title::makeTitle( $row->page_namespace, $row->page_title );
361
				$date = wfTimestamp( TS_ISO_8601, $row->page_touched );
362
				$entry = $this->fileEntry( $title->getCanonicalURL(), $date, $this->priority( $namespace ) );
363
				$length += strlen( $entry );
364
				$this->write( $this->file, $entry );
365
				// generate pages for language variants
366
				if ( $wgContLang->hasVariants() ) {
367
					$variants = $wgContLang->getVariants();
368
					foreach ( $variants as $vCode ) {
369
						if ( $vCode == $wgContLang->getCode() ) {
370
							continue; // we don't want default variant
371
						}
372
						$entry = $this->fileEntry(
373
							$title->getCanonicalURL( '', $vCode ),
374
							$date,
375
							$this->priority( $namespace )
376
						);
377
						$length += strlen( $entry );
378
						$this->write( $this->file, $entry );
379
					}
380
				}
381
			}
382
383
			if ( $this->skipRedirects && $skippedRedirects > 0 ) {
384
				$this->output( "  skipped $skippedRedirects redirect(s)\n" );
385
			}
386
387
			if ( $this->file ) {
388
				$this->write( $this->file, $this->closeFile() );
389
				$this->close( $this->file );
390
			}
391
		}
392
		fwrite( $this->findex, $this->closeIndex() );
393
		fclose( $this->findex );
394
	}
395
396
	/**
397
	 * gzopen() / fopen() wrapper
398
	 *
399
	 * @param string $file
400
	 * @param string $flags
401
	 * @return resource
402
	 */
403
	function open( $file, $flags ) {
404
		$resource = $this->compress ? gzopen( $file, $flags ) : fopen( $file, $flags );
405
		if ( $resource === false ) {
406
			throw new MWException( __METHOD__
407
				. " error opening file $file with flags $flags. Check permissions?" );
408
		}
409
410
		return $resource;
411
	}
412
413
	/**
414
	 * gzwrite() / fwrite() wrapper
415
	 *
416
	 * @param resource $handle
417
	 * @param string $str
418
	 */
419
	function write( &$handle, $str ) {
420
		if ( $handle === true || $handle === false ) {
421
			throw new MWException( __METHOD__ . " was passed a boolean as a file handle.\n" );
422
		}
423
		if ( $this->compress ) {
424
			gzwrite( $handle, $str );
425
		} else {
426
			fwrite( $handle, $str );
427
		}
428
	}
429
430
	/**
431
	 * gzclose() / fclose() wrapper
432
	 *
433
	 * @param resource $handle
434
	 */
435
	function close( &$handle ) {
436
		if ( $this->compress ) {
437
			gzclose( $handle );
438
		} else {
439
			fclose( $handle );
440
		}
441
	}
442
443
	/**
444
	 * Get a sitemap filename
445
	 *
446
	 * @param int $namespace The namespace
447
	 * @param int $count The count
448
	 * @return string
449
	 */
450
	function sitemapFilename( $namespace, $count ) {
451
		$ext = $this->compress ? '.gz' : '';
452
453
		return "sitemap-{$this->identifier}-NS_$namespace-$count.xml$ext";
454
	}
455
456
	/**
457
	 * Return the XML required to open an XML file
458
	 *
459
	 * @return string
460
	 */
461
	function xmlHead() {
462
		return '<?xml version="1.0" encoding="UTF-8"?>' . "\n";
463
	}
464
465
	/**
466
	 * Return the XML schema being used
467
	 *
468
	 * @return string
469
	 */
470
	function xmlSchema() {
471
		return 'http://www.sitemaps.org/schemas/sitemap/0.9';
472
	}
473
474
	/**
475
	 * Return the XML required to open a sitemap index file
476
	 *
477
	 * @return string
478
	 */
479
	function openIndex() {
480
		return $this->xmlHead() . '<sitemapindex xmlns="' . $this->xmlSchema() . '">' . "\n";
481
	}
482
483
	/**
484
	 * Return the XML for a single sitemap indexfile entry
485
	 *
486
	 * @param string $filename The filename of the sitemap file
487
	 * @return string
488
	 */
489
	function indexEntry( $filename ) {
490
		return
491
			"\t<sitemap>\n" .
492
			"\t\t<loc>{$this->urlpath}$filename</loc>\n" .
493
			"\t\t<lastmod>{$this->timestamp}</lastmod>\n" .
494
			"\t</sitemap>\n";
495
	}
496
497
	/**
498
	 * Return the XML required to close a sitemap index file
499
	 *
500
	 * @return string
501
	 */
502
	function closeIndex() {
503
		return "</sitemapindex>\n";
504
	}
505
506
	/**
507
	 * Return the XML required to open a sitemap file
508
	 *
509
	 * @return string
510
	 */
511
	function openFile() {
512
		return $this->xmlHead() . '<urlset xmlns="' . $this->xmlSchema() . '">' . "\n";
513
	}
514
515
	/**
516
	 * Return the XML for a single sitemap entry
517
	 *
518
	 * @param string $url An RFC 2396 compliant URL
519
	 * @param string $date A ISO 8601 date
520
	 * @param string $priority A priority indicator, 0.0 - 1.0 inclusive with a 0.1 stepsize
521
	 * @return string
522
	 */
523
	function fileEntry( $url, $date, $priority ) {
524
		return
525
			"\t<url>\n" .
526
			// bug 34666: $url may contain bad characters such as ampersands.
527
			"\t\t<loc>" . htmlspecialchars( $url ) . "</loc>\n" .
528
			"\t\t<lastmod>$date</lastmod>\n" .
529
			"\t\t<priority>$priority</priority>\n" .
530
			"\t</url>\n";
531
	}
532
533
	/**
534
	 * Return the XML required to close sitemap file
535
	 *
536
	 * @return string
537
	 */
538
	function closeFile() {
539
		return "</urlset>\n";
540
	}
541
542
	/**
543
	 * Populate $this->limit
544
	 *
545
	 * @param int $namespace
546
	 */
547
	function generateLimit( $namespace ) {
548
		// bug 17961: make a title with the longest possible URL in this namespace
549
		$title = Title::makeTitle( $namespace, str_repeat( "\xf0\xa8\xae\x81", 63 ) . "\xe5\x96\x83" );
550
551
		$this->limit = [
552
			strlen( $this->openFile() ),
553
			strlen( $this->fileEntry(
554
				$title->getCanonicalURL(),
555
				wfTimestamp( TS_ISO_8601, wfTimestamp() ),
556
				$this->priority( $namespace )
557
			) ),
558
			strlen( $this->closeFile() )
559
		];
560
	}
561
}
562
563
$maintClass = "GenerateSitemap";
564
require_once RUN_MAINTENANCE_IF_MAIN;
565