Completed
Branch master (9259dd)
by
unknown
27:26
created

WikiExporter::dumpFrom()   F

Complexity

Conditions 31
Paths 4186

Size

Total Lines 167
Code Lines 91

Duplication

Lines 0
Ratio 0 %

Importance

Changes 0
Metric Value
cc 31
eloc 91
nc 4186
nop 1
dl 0
loc 167
rs 2
c 0
b 0
f 0

How to fix   Long Method    Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
<?php
2
/**
3
 * Base class for exporting
4
 *
5
 * Copyright © 2003, 2005, 2006 Brion Vibber <[email protected]>
6
 * https://www.mediawiki.org/
7
 *
8
 * This program is free software; you can redistribute it and/or modify
9
 * it under the terms of the GNU General Public License as published by
10
 * the Free Software Foundation; either version 2 of the License, or
11
 * (at your option) any later version.
12
 *
13
 * This program is distributed in the hope that it will be useful,
14
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
15
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
16
 * GNU General Public License for more details.
17
 *
18
 * You should have received a copy of the GNU General Public License along
19
 * with this program; if not, write to the Free Software Foundation, Inc.,
20
 * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
21
 * http://www.gnu.org/copyleft/gpl.html
22
 *
23
 * @file
24
 */
25
26
/**
27
 * @defgroup Dump Dump
28
 */
29
30
/**
31
 * @ingroup SpecialPage Dump
32
 */
33
class WikiExporter {
34
	/** @var bool Return distinct author list (when not returning full history) */
35
	public $list_authors = false;
36
37
	/** @var bool */
38
	public $dumpUploads = false;
39
40
	/** @var bool */
41
	public $dumpUploadFileContents = false;
42
43
	/** @var string */
44
	public $author_list = "";
45
46
	const FULL = 1;
47
	const CURRENT = 2;
48
	const STABLE = 4; // extension defined
49
	const LOGS = 8;
50
	const RANGE = 16;
51
52
	const BUFFER = 0;
53
	const STREAM = 1;
54
55
	const TEXT = 0;
56
	const STUB = 1;
57
58
	/** @var int */
59
	public $buffer;
60
61
	/** @var int */
62
	public $text;
63
64
	/** @var DumpOutput */
65
	public $sink;
66
67
	/**
68
	 * Returns the export schema version.
69
	 * @return string
70
	 */
71
	public static function schemaVersion() {
72
		return "0.10";
73
	}
74
75
	/**
76
	 * If using WikiExporter::STREAM to stream a large amount of data,
77
	 * provide a database connection which is not managed by
78
	 * LoadBalancer to read from: some history blob types will
79
	 * make additional queries to pull source data while the
80
	 * main query is still running.
81
	 *
82
	 * @param IDatabase $db
83
	 * @param int|array $history One of WikiExporter::FULL, WikiExporter::CURRENT,
84
	 *   WikiExporter::RANGE or WikiExporter::STABLE, or an associative array:
85
	 *   - offset: non-inclusive offset at which to start the query
86
	 *   - limit: maximum number of rows to return
87
	 *   - dir: "asc" or "desc" timestamp order
88
	 * @param int $buffer One of WikiExporter::BUFFER or WikiExporter::STREAM
89
	 * @param int $text One of WikiExporter::TEXT or WikiExporter::STUB
90
	 */
91
	function __construct( $db, $history = WikiExporter::CURRENT,
92
			$buffer = WikiExporter::BUFFER, $text = WikiExporter::TEXT ) {
93
		$this->db = $db;
0 ignored issues
show
Bug introduced by
The property db does not exist. Did you maybe forget to declare it?

In PHP it is possible to write to properties without declaring them. For example, the following is perfectly valid PHP code:

class MyClass { }

$x = new MyClass();
$x->foo = true;

Generally, it is a good practice to explictly declare properties to avoid accidental typos and provide IDE auto-completion:

class MyClass {
    public $foo;
}

$x = new MyClass();
$x->foo = true;
Loading history...
94
		$this->history = $history;
0 ignored issues
show
Bug introduced by
The property history does not exist. Did you maybe forget to declare it?

In PHP it is possible to write to properties without declaring them. For example, the following is perfectly valid PHP code:

class MyClass { }

$x = new MyClass();
$x->foo = true;

Generally, it is a good practice to explictly declare properties to avoid accidental typos and provide IDE auto-completion:

class MyClass {
    public $foo;
}

$x = new MyClass();
$x->foo = true;
Loading history...
95
		$this->buffer = $buffer;
96
		$this->writer = new XmlDumpWriter();
0 ignored issues
show
Bug introduced by
The property writer does not exist. Did you maybe forget to declare it?

In PHP it is possible to write to properties without declaring them. For example, the following is perfectly valid PHP code:

class MyClass { }

$x = new MyClass();
$x->foo = true;

Generally, it is a good practice to explictly declare properties to avoid accidental typos and provide IDE auto-completion:

class MyClass {
    public $foo;
}

$x = new MyClass();
$x->foo = true;
Loading history...
97
		$this->sink = new DumpOutput();
98
		$this->text = $text;
99
	}
100
101
	/**
102
	 * Set the DumpOutput or DumpFilter object which will receive
103
	 * various row objects and XML output for filtering. Filters
104
	 * can be chained or used as callbacks.
105
	 *
106
	 * @param DumpOutput $sink
107
	 */
108
	public function setOutputSink( &$sink ) {
109
		$this->sink =& $sink;
110
	}
111
112
	public function openStream() {
113
		$output = $this->writer->openStream();
114
		$this->sink->writeOpenStream( $output );
115
	}
116
117
	public function closeStream() {
118
		$output = $this->writer->closeStream();
119
		$this->sink->writeCloseStream( $output );
120
	}
121
122
	/**
123
	 * Dumps a series of page and revision records for all pages
124
	 * in the database, either including complete history or only
125
	 * the most recent version.
126
	 */
127
	public function allPages() {
128
		$this->dumpFrom( '' );
129
	}
130
131
	/**
132
	 * Dumps a series of page and revision records for those pages
133
	 * in the database falling within the page_id range given.
134
	 * @param int $start Inclusive lower limit (this id is included)
135
	 * @param int $end Exclusive upper limit (this id is not included)
136
	 *   If 0, no upper limit.
137
	 */
138 View Code Duplication
	public function pagesByRange( $start, $end ) {
139
		$condition = 'page_id >= ' . intval( $start );
140
		if ( $end ) {
141
			$condition .= ' AND page_id < ' . intval( $end );
142
		}
143
		$this->dumpFrom( $condition );
144
	}
145
146
	/**
147
	 * Dumps a series of page and revision records for those pages
148
	 * in the database with revisions falling within the rev_id range given.
149
	 * @param int $start Inclusive lower limit (this id is included)
150
	 * @param int $end Exclusive upper limit (this id is not included)
151
	 *   If 0, no upper limit.
152
	 */
153 View Code Duplication
	public function revsByRange( $start, $end ) {
154
		$condition = 'rev_id >= ' . intval( $start );
155
		if ( $end ) {
156
			$condition .= ' AND rev_id < ' . intval( $end );
157
		}
158
		$this->dumpFrom( $condition );
159
	}
160
161
	/**
162
	 * @param Title $title
163
	 */
164
	public function pageByTitle( $title ) {
165
		$this->dumpFrom(
166
			'page_namespace=' . $title->getNamespace() .
167
			' AND page_title=' . $this->db->addQuotes( $title->getDBkey() ) );
168
	}
169
170
	/**
171
	 * @param string $name
172
	 * @throws MWException
173
	 */
174
	public function pageByName( $name ) {
175
		$title = Title::newFromText( $name );
176
		if ( is_null( $title ) ) {
177
			throw new MWException( "Can't export invalid title" );
178
		} else {
179
			$this->pageByTitle( $title );
180
		}
181
	}
182
183
	/**
184
	 * @param array $names
185
	 */
186
	public function pagesByName( $names ) {
187
		foreach ( $names as $name ) {
188
			$this->pageByName( $name );
189
		}
190
	}
191
192
	public function allLogs() {
193
		$this->dumpFrom( '' );
194
	}
195
196
	/**
197
	 * @param int $start
198
	 * @param int $end
199
	 */
200 View Code Duplication
	public function logsByRange( $start, $end ) {
201
		$condition = 'log_id >= ' . intval( $start );
202
		if ( $end ) {
203
			$condition .= ' AND log_id < ' . intval( $end );
204
		}
205
		$this->dumpFrom( $condition );
206
	}
207
208
	/**
209
	 * Generates the distinct list of authors of an article
210
	 * Not called by default (depends on $this->list_authors)
211
	 * Can be set by Special:Export when not exporting whole history
212
	 *
213
	 * @param array $cond
214
	 */
215
	protected function do_list_authors( $cond ) {
216
		$this->author_list = "<contributors>";
217
		// rev_deleted
218
219
		$res = $this->db->select(
220
			[ 'page', 'revision' ],
221
			[ 'DISTINCT rev_user_text', 'rev_user' ],
222
			[
223
				$this->db->bitAnd( 'rev_deleted', Revision::DELETED_USER ) . ' = 0',
224
				$cond,
225
				'page_id = rev_id',
226
			],
227
			__METHOD__
228
		);
229
230
		foreach ( $res as $row ) {
231
			$this->author_list .= "<contributor>" .
232
				"<username>" .
233
				htmlentities( $row->rev_user_text ) .
234
				"</username>" .
235
				"<id>" .
236
				$row->rev_user .
237
				"</id>" .
238
				"</contributor>";
239
		}
240
		$this->author_list .= "</contributors>";
241
	}
242
243
	/**
244
	 * @param string $cond
245
	 * @throws MWException
246
	 * @throws Exception
247
	 */
248
	protected function dumpFrom( $cond = '' ) {
249
		# For logging dumps...
250
		if ( $this->history & self::LOGS ) {
251
			$where = [ 'user_id = log_user' ];
252
			# Hide private logs
253
			$hideLogs = LogEventsList::getExcludeClause( $this->db );
254
			if ( $hideLogs ) {
0 ignored issues
show
Bug Best Practice introduced by
The expression $hideLogs of type string|false is loosely compared to true; this is ambiguous if the string can be empty. You might want to explicitly use !== false instead.

In PHP, under loose comparison (like ==, or !=, or switch conditions), values of different types might be equal.

For string values, the empty string '' is a special case, in particular the following results might be unexpected:

''   == false // true
''   == null  // true
'ab' == false // false
'ab' == null  // false

// It is often better to use strict comparison
'' === false // false
'' === null  // false
Loading history...
255
				$where[] = $hideLogs;
256
			}
257
			# Add on any caller specified conditions
258
			if ( $cond ) {
259
				$where[] = $cond;
260
			}
261
			# Get logging table name for logging.* clause
262
			$logging = $this->db->tableName( 'logging' );
263
264
			if ( $this->buffer == WikiExporter::STREAM ) {
265
				$prev = $this->db->bufferResults( false );
266
			}
267
			$result = null; // Assuring $result is not undefined, if exception occurs early
268
			try {
269
				$result = $this->db->select( [ 'logging', 'user' ],
270
					[ "{$logging}.*", 'user_name' ], // grab the user name
271
					$where,
272
					__METHOD__,
273
					[ 'ORDER BY' => 'log_id', 'USE INDEX' => [ 'logging' => 'PRIMARY' ] ]
274
				);
275
				$this->outputLogStream( $result );
276
				if ( $this->buffer == WikiExporter::STREAM ) {
277
					$this->db->bufferResults( $prev );
0 ignored issues
show
Bug introduced by
The variable $prev does not seem to be defined for all execution paths leading up to this point.

If you define a variable conditionally, it can happen that it is not defined for all execution paths.

Let’s take a look at an example:

function myFunction($a) {
    switch ($a) {
        case 'foo':
            $x = 1;
            break;

        case 'bar':
            $x = 2;
            break;
    }

    // $x is potentially undefined here.
    echo $x;
}

In the above example, the variable $x is defined if you pass “foo” or “bar” as argument for $a. However, since the switch statement has no default case statement, if you pass any other value, the variable $x would be undefined.

Available Fixes

  1. Check for existence of the variable explicitly:

    function myFunction($a) {
        switch ($a) {
            case 'foo':
                $x = 1;
                break;
    
            case 'bar':
                $x = 2;
                break;
        }
    
        if (isset($x)) { // Make sure it's always set.
            echo $x;
        }
    }
    
  2. Define a default value for the variable:

    function myFunction($a) {
        $x = ''; // Set a default which gets overridden for certain paths.
        switch ($a) {
            case 'foo':
                $x = 1;
                break;
    
            case 'bar':
                $x = 2;
                break;
        }
    
        echo $x;
    }
    
  3. Add a value for the missing path:

    function myFunction($a) {
        switch ($a) {
            case 'foo':
                $x = 1;
                break;
    
            case 'bar':
                $x = 2;
                break;
    
            // We add support for the missing case.
            default:
                $x = '';
                break;
        }
    
        echo $x;
    }
    
Loading history...
278
				}
279
			} catch ( Exception $e ) {
280
				// Throwing the exception does not reliably free the resultset, and
281
				// would also leave the connection in unbuffered mode.
282
283
				// Freeing result
284
				try {
285
					if ( $result ) {
286
						$result->free();
287
					}
288
				} catch ( Exception $e2 ) {
289
					// Already in panic mode -> ignoring $e2 as $e has
290
					// higher priority
291
				}
292
293
				// Putting database back in previous buffer mode
294
				try {
295
					if ( $this->buffer == WikiExporter::STREAM ) {
296
						$this->db->bufferResults( $prev );
297
					}
298
				} catch ( Exception $e2 ) {
299
					// Already in panic mode -> ignoring $e2 as $e has
300
					// higher priority
301
				}
302
303
				// Inform caller about problem
304
				throw $e;
305
			}
306
		# For page dumps...
307
		} else {
308
			$tables = [ 'page', 'revision' ];
309
			$opts = [ 'ORDER BY' => 'page_id ASC' ];
310
			$opts['USE INDEX'] = [];
311
			$join = [];
312
			if ( is_array( $this->history ) ) {
313
				# Time offset/limit for all pages/history...
314
				$revJoin = 'page_id=rev_page';
315
				# Set time order
316
				if ( $this->history['dir'] == 'asc' ) {
317
					$op = '>';
318
					$opts['ORDER BY'] = 'rev_timestamp ASC';
319
				} else {
320
					$op = '<';
321
					$opts['ORDER BY'] = 'rev_timestamp DESC';
322
				}
323
				# Set offset
324
				if ( !empty( $this->history['offset'] ) ) {
325
					$revJoin .= " AND rev_timestamp $op " .
326
						$this->db->addQuotes( $this->db->timestamp( $this->history['offset'] ) );
327
				}
328
				$join['revision'] = [ 'INNER JOIN', $revJoin ];
329
				# Set query limit
330
				if ( !empty( $this->history['limit'] ) ) {
331
					$opts['LIMIT'] = intval( $this->history['limit'] );
332
				}
333
			} elseif ( $this->history & WikiExporter::FULL ) {
334
				# Full history dumps...
335
				$join['revision'] = [ 'INNER JOIN', 'page_id=rev_page' ];
336
			} elseif ( $this->history & WikiExporter::CURRENT ) {
337
				# Latest revision dumps...
338
				if ( $this->list_authors && $cond != '' ) { // List authors, if so desired
339
					$this->do_list_authors( $cond );
340
				}
341
				$join['revision'] = [ 'INNER JOIN', 'page_id=rev_page AND page_latest=rev_id' ];
342
			} elseif ( $this->history & WikiExporter::STABLE ) {
343
				# "Stable" revision dumps...
344
				# Default JOIN, to be overridden...
345
				$join['revision'] = [ 'INNER JOIN', 'page_id=rev_page AND page_latest=rev_id' ];
346
				# One, and only one hook should set this, and return false
347
				if ( Hooks::run( 'WikiExporter::dumpStableQuery', [ &$tables, &$opts, &$join ] ) ) {
348
					throw new MWException( __METHOD__ . " given invalid history dump type." );
349
				}
350
			} elseif ( $this->history & WikiExporter::RANGE ) {
351
				# Dump of revisions within a specified range
352
				$join['revision'] = [ 'INNER JOIN', 'page_id=rev_page' ];
353
				$opts['ORDER BY'] = [ 'rev_page ASC', 'rev_id ASC' ];
354
			} else {
355
				# Unknown history specification parameter?
356
				throw new MWException( __METHOD__ . " given invalid history dump type." );
357
			}
358
			# Query optimization hacks
359
			if ( $cond == '' ) {
360
				$opts[] = 'STRAIGHT_JOIN';
361
				$opts['USE INDEX']['page'] = 'PRIMARY';
362
			}
363
			# Build text join options
364
			if ( $this->text != WikiExporter::STUB ) { // 1-pass
365
				$tables[] = 'text';
366
				$join['text'] = [ 'INNER JOIN', 'rev_text_id=old_id' ];
367
			}
368
369
			if ( $this->buffer == WikiExporter::STREAM ) {
370
				$prev = $this->db->bufferResults( false );
371
			}
372
373
			$result = null; // Assuring $result is not undefined, if exception occurs early
374
			try {
375
				Hooks::run( 'ModifyExportQuery',
376
						[ $this->db, &$tables, &$cond, &$opts, &$join ] );
377
378
				# Do the query!
379
				$result = $this->db->select( $tables, '*', $cond, __METHOD__, $opts, $join );
380
				# Output dump results
381
				$this->outputPageStream( $result );
382
383
				if ( $this->buffer == WikiExporter::STREAM ) {
384
					$this->db->bufferResults( $prev );
385
				}
386
			} catch ( Exception $e ) {
387
				// Throwing the exception does not reliably free the resultset, and
388
				// would also leave the connection in unbuffered mode.
389
390
				// Freeing result
391
				try {
392
					if ( $result ) {
393
						$result->free();
394
					}
395
				} catch ( Exception $e2 ) {
396
					// Already in panic mode -> ignoring $e2 as $e has
397
					// higher priority
398
				}
399
400
				// Putting database back in previous buffer mode
401
				try {
402
					if ( $this->buffer == WikiExporter::STREAM ) {
403
						$this->db->bufferResults( $prev );
404
					}
405
				} catch ( Exception $e2 ) {
406
					// Already in panic mode -> ignoring $e2 as $e has
407
					// higher priority
408
				}
409
410
				// Inform caller about problem
411
				throw $e;
412
			}
413
		}
414
	}
415
416
	/**
417
	 * Runs through a query result set dumping page and revision records.
418
	 * The result set should be sorted/grouped by page to avoid duplicate
419
	 * page records in the output.
420
	 *
421
	 * Should be safe for
422
	 * streaming (non-buffered) queries, as long as it was made on a
423
	 * separate database connection not managed by LoadBalancer; some
424
	 * blob storage types will make queries to pull source data.
425
	 *
426
	 * @param ResultWrapper $resultset
427
	 */
428
	protected function outputPageStream( $resultset ) {
429
		$last = null;
430
		foreach ( $resultset as $row ) {
431
			if ( $last === null ||
432
				$last->page_namespace != $row->page_namespace ||
433
				$last->page_title != $row->page_title ) {
434 View Code Duplication
				if ( $last !== null ) {
435
					$output = '';
436
					if ( $this->dumpUploads ) {
437
						$output .= $this->writer->writeUploads( $last, $this->dumpUploadFileContents );
438
					}
439
					$output .= $this->writer->closePage();
440
					$this->sink->writeClosePage( $output );
441
				}
442
				$output = $this->writer->openPage( $row );
443
				$this->sink->writeOpenPage( $row, $output );
0 ignored issues
show
Bug introduced by
It seems like $row defined by $row on line 430 can also be of type null; however, DumpOutput::writeOpenPage() does only seem to accept object, maybe add an additional type check?

If a method or function can return multiple different values and unless you are sure that you only can receive a single value in this context, we recommend to add an additional type check:

/**
 * @return array|string
 */
function returnsDifferentValues($x) {
    if ($x) {
        return 'foo';
    }

    return array();
}

$x = returnsDifferentValues($y);
if (is_array($x)) {
    // $x is an array.
}

If this a common case that PHP Analyzer should handle natively, please let us know by opening an issue.

Loading history...
444
				$last = $row;
445
			}
446
			$output = $this->writer->writeRevision( $row );
447
			$this->sink->writeRevision( $row, $output );
0 ignored issues
show
Bug introduced by
It seems like $row defined by $row on line 430 can also be of type null; however, DumpOutput::writeRevision() does only seem to accept object, maybe add an additional type check?

If a method or function can return multiple different values and unless you are sure that you only can receive a single value in this context, we recommend to add an additional type check:

/**
 * @return array|string
 */
function returnsDifferentValues($x) {
    if ($x) {
        return 'foo';
    }

    return array();
}

$x = returnsDifferentValues($y);
if (is_array($x)) {
    // $x is an array.
}

If this a common case that PHP Analyzer should handle natively, please let us know by opening an issue.

Loading history...
448
		}
449 View Code Duplication
		if ( $last !== null ) {
450
			$output = '';
451
			if ( $this->dumpUploads ) {
452
				$output .= $this->writer->writeUploads( $last, $this->dumpUploadFileContents );
453
			}
454
			$output .= $this->author_list;
455
			$output .= $this->writer->closePage();
456
			$this->sink->writeClosePage( $output );
457
		}
458
	}
459
460
	/**
461
	 * @param ResultWrapper $resultset
462
	 */
463
	protected function outputLogStream( $resultset ) {
464
		foreach ( $resultset as $row ) {
465
			$output = $this->writer->writeLogItem( $row );
466
			$this->sink->writeLogItem( $row, $output );
0 ignored issues
show
Bug introduced by
It seems like $row defined by $row on line 464 can also be of type null; however, DumpOutput::writeLogItem() does only seem to accept object, maybe add an additional type check?

If a method or function can return multiple different values and unless you are sure that you only can receive a single value in this context, we recommend to add an additional type check:

/**
 * @return array|string
 */
function returnsDifferentValues($x) {
    if ($x) {
        return 'foo';
    }

    return array();
}

$x = returnsDifferentValues($y);
if (is_array($x)) {
    // $x is an array.
}

If this a common case that PHP Analyzer should handle natively, please let us know by opening an issue.

Loading history...
467
		}
468
	}
469
}
470