XmlDumpWriter   C
last analyzed

Complexity

Total Complexity 59

Size/Duplication

Total Lines 412
Duplicated Lines 2.43 %

Coupling/Cohesion

Components 2
Dependencies 10

Importance

Changes 0
Metric Value
dl 10
loc 412
rs 6.1904
c 0
b 0
f 0
wmc 59
lcom 2
cbo 10

18 Methods

Rating   Name   Duplication   Size   Complexity  
B openStream() 0 24 1
A siteInfo() 0 12 1
A sitename() 0 4 1
A dbname() 0 4 1
A generator() 0 4 1
A homelink() 0 3 1
A caseSetting() 0 6 2
A namespaces() 0 14 3
A closeStream() 0 3 1
B openPage() 0 25 5
A closePage() 0 3 1
F writeRevision() 5 77 20
B writeLogItem() 5 36 5
A writeTimestamp() 0 4 1
A writeContributor() 0 11 3
B writeUploads() 0 14 5
B writeUpload() 0 37 4
A canonicalTitle() 0 14 3

How to fix   Duplicated Code    Complexity   

Duplicated Code

Duplicate code is one of the most pungent code smells. A rule that is often used is to re-structure code once it is duplicated in three or more places.

Common duplication problems, and corresponding solutions are:

Complex Class

 Tip:   Before tackling complexity, make sure that you eliminate any duplication first. This often can reduce the size of classes significantly.

Complex classes like XmlDumpWriter often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes. You can also have a look at the cohesion graph to spot any un-connected, or weakly-connected components.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

While breaking up the class, it is a good idea to analyze how other classes use XmlDumpWriter, and based on these observations, apply Extract Interface, too.

1
<?php
2
/**
3
 * XmlDumpWriter
4
 *
5
 * Copyright © 2003, 2005, 2006 Brion Vibber <[email protected]>
6
 * https://www.mediawiki.org/
7
 *
8
 * This program is free software; you can redistribute it and/or modify
9
 * it under the terms of the GNU General Public License as published by
10
 * the Free Software Foundation; either version 2 of the License, or
11
 * (at your option) any later version.
12
 *
13
 * This program is distributed in the hope that it will be useful,
14
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
15
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
16
 * GNU General Public License for more details.
17
 *
18
 * You should have received a copy of the GNU General Public License along
19
 * with this program; if not, write to the Free Software Foundation, Inc.,
20
 * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
21
 * http://www.gnu.org/copyleft/gpl.html
22
 *
23
 * @file
24
 */
25
26
/**
27
 * @ingroup Dump
28
 */
29
class XmlDumpWriter {
30
	/**
31
	 * Opens the XML output stream's root "<mediawiki>" element.
32
	 * This does not include an xml directive, so is safe to include
33
	 * as a subelement in a larger XML stream. Namespace and XML Schema
34
	 * references are included.
35
	 *
36
	 * Output will be encoded in UTF-8.
37
	 *
38
	 * @return string
39
	 */
40
	function openStream() {
41
		global $wgLanguageCode;
42
		$ver = WikiExporter::schemaVersion();
43
		return Xml::element( 'mediawiki', [
44
			'xmlns'              => "http://www.mediawiki.org/xml/export-$ver/",
45
			'xmlns:xsi'          => "http://www.w3.org/2001/XMLSchema-instance",
46
			/*
47
			 * When a new version of the schema is created, it needs staging on mediawiki.org.
48
			 * This requires a change in the operations/mediawiki-config git repo.
49
			 *
50
			 * Create a changeset like https://gerrit.wikimedia.org/r/#/c/149643/ in which
51
			 * you copy in the new xsd file.
52
			 *
53
			 * After it is reviewed, merged and deployed (sync-docroot), the index.html needs purging.
54
			 * echo "https://www.mediawiki.org/xml/index.html" | mwscript purgeList.php --wiki=aawiki
55
			 */
56
			'xsi:schemaLocation' => "http://www.mediawiki.org/xml/export-$ver/ " .
57
				"http://www.mediawiki.org/xml/export-$ver.xsd",
58
			'version'            => $ver,
59
			'xml:lang'           => $wgLanguageCode ],
60
			null ) .
61
			"\n" .
62
			$this->siteInfo();
63
	}
64
65
	/**
66
	 * @return string
67
	 */
68
	function siteInfo() {
69
		$info = [
70
			$this->sitename(),
71
			$this->dbname(),
72
			$this->homelink(),
73
			$this->generator(),
74
			$this->caseSetting(),
75
			$this->namespaces() ];
76
		return "  <siteinfo>\n    " .
77
			implode( "\n    ", $info ) .
78
			"\n  </siteinfo>\n";
79
	}
80
81
	/**
82
	 * @return string
83
	 */
84
	function sitename() {
85
		global $wgSitename;
86
		return Xml::element( 'sitename', [], $wgSitename );
87
	}
88
89
	/**
90
	 * @return string
91
	 */
92
	function dbname() {
93
		global $wgDBname;
94
		return Xml::element( 'dbname', [], $wgDBname );
95
	}
96
97
	/**
98
	 * @return string
99
	 */
100
	function generator() {
101
		global $wgVersion;
102
		return Xml::element( 'generator', [], "MediaWiki $wgVersion" );
103
	}
104
105
	/**
106
	 * @return string
107
	 */
108
	function homelink() {
109
		return Xml::element( 'base', [], Title::newMainPage()->getCanonicalURL() );
110
	}
111
112
	/**
113
	 * @return string
114
	 */
115
	function caseSetting() {
116
		global $wgCapitalLinks;
117
		// "case-insensitive" option is reserved for future
118
		$sensitivity = $wgCapitalLinks ? 'first-letter' : 'case-sensitive';
119
		return Xml::element( 'case', [], $sensitivity );
120
	}
121
122
	/**
123
	 * @return string
124
	 */
125
	function namespaces() {
126
		global $wgContLang;
127
		$spaces = "<namespaces>\n";
128
		foreach ( $wgContLang->getFormattedNamespaces() as $ns => $title ) {
129
			$spaces .= '      ' .
130
				Xml::element( 'namespace',
131
					[
132
						'key' => $ns,
133
						'case' => MWNamespace::isCapitalized( $ns ) ? 'first-letter' : 'case-sensitive',
134
					], $title ) . "\n";
135
		}
136
		$spaces .= "    </namespaces>";
137
		return $spaces;
138
	}
139
140
	/**
141
	 * Closes the output stream with the closing root element.
142
	 * Call when finished dumping things.
143
	 *
144
	 * @return string
145
	 */
146
	function closeStream() {
147
		return "</mediawiki>\n";
148
	}
149
150
	/**
151
	 * Opens a "<page>" section on the output stream, with data
152
	 * from the given database row.
153
	 *
154
	 * @param object $row
155
	 * @return string
156
	 */
157
	public function openPage( $row ) {
158
		$out = "  <page>\n";
159
		$title = Title::makeTitle( $row->page_namespace, $row->page_title );
160
		$out .= '    ' . Xml::elementClean( 'title', [], self::canonicalTitle( $title ) ) . "\n";
161
		$out .= '    ' . Xml::element( 'ns', [], strval( $row->page_namespace ) ) . "\n";
162
		$out .= '    ' . Xml::element( 'id', [], strval( $row->page_id ) ) . "\n";
163
		if ( $row->page_is_redirect ) {
164
			$page = WikiPage::factory( $title );
165
			$redirect = $page->getRedirectTarget();
166
			if ( $redirect instanceof Title && $redirect->isValidRedirectTarget() ) {
167
				$out .= '    ';
168
				$out .= Xml::element( 'redirect', [ 'title' => self::canonicalTitle( $redirect ) ] );
169
				$out .= "\n";
170
			}
171
		}
172
173
		if ( $row->page_restrictions != '' ) {
174
			$out .= '    ' . Xml::element( 'restrictions', [],
175
				strval( $row->page_restrictions ) ) . "\n";
176
		}
177
178
		Hooks::run( 'XmlDumpWriterOpenPage', [ $this, &$out, $row, $title ] );
179
180
		return $out;
181
	}
182
183
	/**
184
	 * Closes a "<page>" section on the output stream.
185
	 *
186
	 * @access private
187
	 * @return string
188
	 */
189
	function closePage() {
190
		return "  </page>\n";
191
	}
192
193
	/**
194
	 * Dumps a "<revision>" section on the output stream, with
195
	 * data filled in from the given database row.
196
	 *
197
	 * @param object $row
198
	 * @return string
199
	 * @access private
200
	 */
201
	function writeRevision( $row ) {
202
203
		$out = "    <revision>\n";
204
		$out .= "      " . Xml::element( 'id', null, strval( $row->rev_id ) ) . "\n";
205
		if ( isset( $row->rev_parent_id ) && $row->rev_parent_id ) {
206
			$out .= "      " . Xml::element( 'parentid', null, strval( $row->rev_parent_id ) ) . "\n";
207
		}
208
209
		$out .= $this->writeTimestamp( $row->rev_timestamp );
210
211
		if ( isset( $row->rev_deleted ) && ( $row->rev_deleted & Revision::DELETED_USER ) ) {
212
			$out .= "      " . Xml::element( 'contributor', [ 'deleted' => 'deleted' ] ) . "\n";
213
		} else {
214
			$out .= $this->writeContributor( $row->rev_user, $row->rev_user_text );
215
		}
216
217
		if ( isset( $row->rev_minor_edit ) && $row->rev_minor_edit ) {
218
			$out .= "      <minor/>\n";
219
		}
220 View Code Duplication
		if ( isset( $row->rev_deleted ) && ( $row->rev_deleted & Revision::DELETED_COMMENT ) ) {
221
			$out .= "      " . Xml::element( 'comment', [ 'deleted' => 'deleted' ] ) . "\n";
222
		} elseif ( $row->rev_comment != '' ) {
223
			$out .= "      " . Xml::elementClean( 'comment', [], strval( $row->rev_comment ) ) . "\n";
224
		}
225
226
		if ( isset( $row->rev_content_model ) && !is_null( $row->rev_content_model ) ) {
227
			$content_model = strval( $row->rev_content_model );
228
		} else {
229
			// probably using $wgContentHandlerUseDB = false;
230
			$title = Title::makeTitle( $row->page_namespace, $row->page_title );
231
			$content_model = ContentHandler::getDefaultModelFor( $title );
232
		}
233
234
		$content_handler = ContentHandler::getForModelID( $content_model );
235
236
		if ( isset( $row->rev_content_format ) && !is_null( $row->rev_content_format ) ) {
237
			$content_format = strval( $row->rev_content_format );
238
		} else {
239
			// probably using $wgContentHandlerUseDB = false;
240
			$content_format = $content_handler->getDefaultFormat();
241
		}
242
243
		$out .= "      " . Xml::element( 'model', null, strval( $content_model ) ) . "\n";
244
		$out .= "      " . Xml::element( 'format', null, strval( $content_format ) ) . "\n";
245
246
		$text = '';
247
		if ( isset( $row->rev_deleted ) && ( $row->rev_deleted & Revision::DELETED_TEXT ) ) {
248
			$out .= "      " . Xml::element( 'text', [ 'deleted' => 'deleted' ] ) . "\n";
249
		} elseif ( isset( $row->old_text ) ) {
250
			// Raw text from the database may have invalid chars
251
			$text = strval( Revision::getRevisionText( $row ) );
252
			$text = $content_handler->exportTransform( $text, $content_format );
253
			$out .= "      " . Xml::elementClean( 'text',
254
				[ 'xml:space' => 'preserve', 'bytes' => intval( $row->rev_len ) ],
255
				strval( $text ) ) . "\n";
256
		} else {
257
			// Stub output
258
			$out .= "      " . Xml::element( 'text',
259
				[ 'id' => $row->rev_text_id, 'bytes' => intval( $row->rev_len ) ],
260
				"" ) . "\n";
261
		}
262
263
		if ( isset( $row->rev_sha1 )
264
			&& $row->rev_sha1
265
			&& !( $row->rev_deleted & Revision::DELETED_TEXT )
266
		) {
267
			$out .= "      " . Xml::element( 'sha1', null, strval( $row->rev_sha1 ) ) . "\n";
268
		} else {
269
			$out .= "      <sha1/>\n";
270
		}
271
272
		Hooks::run( 'XmlDumpWriterWriteRevision', [ &$this, &$out, $row, $text ] );
273
274
		$out .= "    </revision>\n";
275
276
		return $out;
277
	}
278
279
	/**
280
	 * Dumps a "<logitem>" section on the output stream, with
281
	 * data filled in from the given database row.
282
	 *
283
	 * @param object $row
284
	 * @return string
285
	 * @access private
286
	 */
287
	function writeLogItem( $row ) {
288
289
		$out = "  <logitem>\n";
290
		$out .= "    " . Xml::element( 'id', null, strval( $row->log_id ) ) . "\n";
291
292
		$out .= $this->writeTimestamp( $row->log_timestamp, "    " );
293
294
		if ( $row->log_deleted & LogPage::DELETED_USER ) {
295
			$out .= "    " . Xml::element( 'contributor', [ 'deleted' => 'deleted' ] ) . "\n";
296
		} else {
297
			$out .= $this->writeContributor( $row->log_user, $row->user_name, "    " );
298
		}
299
300 View Code Duplication
		if ( $row->log_deleted & LogPage::DELETED_COMMENT ) {
301
			$out .= "    " . Xml::element( 'comment', [ 'deleted' => 'deleted' ] ) . "\n";
302
		} elseif ( $row->log_comment != '' ) {
303
			$out .= "    " . Xml::elementClean( 'comment', null, strval( $row->log_comment ) ) . "\n";
304
		}
305
306
		$out .= "    " . Xml::element( 'type', null, strval( $row->log_type ) ) . "\n";
307
		$out .= "    " . Xml::element( 'action', null, strval( $row->log_action ) ) . "\n";
308
309
		if ( $row->log_deleted & LogPage::DELETED_ACTION ) {
310
			$out .= "    " . Xml::element( 'text', [ 'deleted' => 'deleted' ] ) . "\n";
311
		} else {
312
			$title = Title::makeTitle( $row->log_namespace, $row->log_title );
313
			$out .= "    " . Xml::elementClean( 'logtitle', null, self::canonicalTitle( $title ) ) . "\n";
314
			$out .= "    " . Xml::elementClean( 'params',
315
				[ 'xml:space' => 'preserve' ],
316
				strval( $row->log_params ) ) . "\n";
317
		}
318
319
		$out .= "  </logitem>\n";
320
321
		return $out;
322
	}
323
324
	/**
325
	 * @param string $timestamp
326
	 * @param string $indent Default to six spaces
327
	 * @return string
328
	 */
329
	function writeTimestamp( $timestamp, $indent = "      " ) {
330
		$ts = wfTimestamp( TS_ISO_8601, $timestamp );
331
		return $indent . Xml::element( 'timestamp', null, $ts ) . "\n";
0 ignored issues
show
Security Bug introduced by
It seems like $ts defined by wfTimestamp(TS_ISO_8601, $timestamp) on line 330 can also be of type false; however, Xml::element() does only seem to accept string, did you maybe forget to handle an error condition?

This check looks for type mismatches where the missing type is false. This is usually indicative of an error condtion.

Consider the follow example

<?php

function getDate($date)
{
    if ($date !== null) {
        return new DateTime($date);
    }

    return false;
}

This function either returns a new DateTime object or false, if there was an error. This is a typical pattern in PHP programming to show that an error has occurred without raising an exception. The calling code should check for this returned false before passing on the value to another function or method that may not be able to handle a false.

Loading history...
332
	}
333
334
	/**
335
	 * @param int $id
336
	 * @param string $text
337
	 * @param string $indent Default to six spaces
338
	 * @return string
339
	 */
340
	function writeContributor( $id, $text, $indent = "      " ) {
341
		$out = $indent . "<contributor>\n";
342
		if ( $id || !IP::isValid( $text ) ) {
343
			$out .= $indent . "  " . Xml::elementClean( 'username', null, strval( $text ) ) . "\n";
344
			$out .= $indent . "  " . Xml::element( 'id', null, strval( $id ) ) . "\n";
345
		} else {
346
			$out .= $indent . "  " . Xml::elementClean( 'ip', null, strval( $text ) ) . "\n";
347
		}
348
		$out .= $indent . "</contributor>\n";
349
		return $out;
350
	}
351
352
	/**
353
	 * Warning! This data is potentially inconsistent. :(
354
	 * @param object $row
355
	 * @param bool $dumpContents
356
	 * @return string
357
	 */
358
	function writeUploads( $row, $dumpContents = false ) {
359
		if ( $row->page_namespace == NS_FILE ) {
360
			$img = wfLocalFile( $row->page_title );
361
			if ( $img && $img->exists() ) {
362
				$out = '';
363
				foreach ( array_reverse( $img->getHistory() ) as $ver ) {
364
					$out .= $this->writeUpload( $ver, $dumpContents );
365
				}
366
				$out .= $this->writeUpload( $img, $dumpContents );
367
				return $out;
368
			}
369
		}
370
		return '';
371
	}
372
373
	/**
374
	 * @param File $file
375
	 * @param bool $dumpContents
376
	 * @return string
377
	 */
378
	function writeUpload( $file, $dumpContents = false ) {
379
		if ( $file->isOld() ) {
380
			$archiveName = "      " .
381
				Xml::element( 'archivename', null, $file->getArchiveName() ) . "\n";
0 ignored issues
show
Bug introduced by
It seems like you code against a specific sub-type and not the parent class File as the method getArchiveName() does only exist in the following sub-classes of File: OldLocalFile. Maybe you want to instanceof check for one of these explicitly?

Let’s take a look at an example:

abstract class User
{
    /** @return string */
    abstract public function getPassword();
}

class MyUser extends User
{
    public function getPassword()
    {
        // return something
    }

    public function getDisplayName()
    {
        // return some name.
    }
}

class AuthSystem
{
    public function authenticate(User $user)
    {
        $this->logger->info(sprintf('Authenticating %s.', $user->getDisplayName()));
        // do something.
    }
}

In the above example, the authenticate() method works fine as long as you just pass instances of MyUser. However, if you now also want to pass a different sub-classes of User which does not have a getDisplayName() method, the code will break.

Available Fixes

  1. Change the type-hint for the parameter:

    class AuthSystem
    {
        public function authenticate(MyUser $user) { /* ... */ }
    }
    
  2. Add an additional type-check:

    class AuthSystem
    {
        public function authenticate(User $user)
        {
            if ($user instanceof MyUser) {
                $this->logger->info(/** ... */);
            }
    
            // or alternatively
            if ( ! $user instanceof MyUser) {
                throw new \LogicException(
                    '$user must be an instance of MyUser, '
                   .'other instances are not supported.'
                );
            }
    
        }
    }
    
Note: PHP Analyzer uses reverse abstract interpretation to narrow down the types inside the if block in such a case.
  1. Add the method to the parent class:

    abstract class User
    {
        /** @return string */
        abstract public function getPassword();
    
        /** @return string */
        abstract public function getDisplayName();
    }
    
Loading history...
382
		} else {
383
			$archiveName = '';
384
		}
385
		if ( $dumpContents ) {
386
			$be = $file->getRepo()->getBackend();
387
			# Dump file as base64
388
			# Uses only XML-safe characters, so does not need escaping
389
			# @todo Too bad this loads the contents into memory (script might swap)
390
			$contents = '      <contents encoding="base64">' .
391
				chunk_split( base64_encode(
392
					$be->getFileContents( [ 'src' => $file->getPath() ] ) ) ) .
393
				"      </contents>\n";
394
		} else {
395
			$contents = '';
396
		}
397
		if ( $file->isDeleted( File::DELETED_COMMENT ) ) {
398
			$comment = Xml::element( 'comment', [ 'deleted' => 'deleted' ] );
399
		} else {
400
			$comment = Xml::elementClean( 'comment', null, $file->getDescription() );
401
		}
402
		return "    <upload>\n" .
403
			$this->writeTimestamp( $file->getTimestamp() ) .
0 ignored issues
show
Bug introduced by
It seems like $file->getTimestamp() targeting File::getTimestamp() can also be of type boolean; however, XmlDumpWriter::writeTimestamp() does only seem to accept string, maybe add an additional type check?

This check looks at variables that are passed out again to other methods.

If the outgoing method call has stricter type requirements than the method itself, an issue is raised.

An additional type check may prevent trouble.

Loading history...
404
			$this->writeContributor( $file->getUser( 'id' ), $file->getUser( 'text' ) ) .
405
			"      " . $comment . "\n" .
406
			"      " . Xml::element( 'filename', null, $file->getName() ) . "\n" .
407
			$archiveName .
408
			"      " . Xml::element( 'src', null, $file->getCanonicalUrl() ) . "\n" .
0 ignored issues
show
Security Bug introduced by
It seems like $file->getCanonicalUrl() targeting File::getCanonicalUrl() can also be of type false; however, Xml::element() does only seem to accept string, did you maybe forget to handle an error condition?
Loading history...
409
			"      " . Xml::element( 'size', null, $file->getSize() ) . "\n" .
410
			"      " . Xml::element( 'sha1base36', null, $file->getSha1() ) . "\n" .
411
			"      " . Xml::element( 'rel', null, $file->getRel() ) . "\n" .
412
			$contents .
413
			"    </upload>\n";
414
	}
415
416
	/**
417
	 * Return prefixed text form of title, but using the content language's
418
	 * canonical namespace. This skips any special-casing such as gendered
419
	 * user namespaces -- which while useful, are not yet listed in the
420
	 * XML "<siteinfo>" data so are unsafe in export.
421
	 *
422
	 * @param Title $title
423
	 * @return string
424
	 * @since 1.18
425
	 */
426
	public static function canonicalTitle( Title $title ) {
427
		if ( $title->isExternal() ) {
428
			return $title->getPrefixedText();
429
		}
430
431
		global $wgContLang;
432
		$prefix = $wgContLang->getFormattedNsText( $title->getNamespace() );
433
434
		if ( $prefix !== '' ) {
435
			$prefix .= ':';
436
		}
437
438
		return $prefix . $title->getText();
439
	}
440
}
441