Scrutinizer GitHub App not installed

We could not synchronize checks via GitHub's checks API since Scrutinizer's GitHub App is not installed for this repository.

Install GitHub App

GitHub Access Token became invalid

It seems like the GitHub access token used for retrieving details about this repository from GitHub became invalid. This might prevent certain types of inspections from being run (in particular, everything related to pull requests).
Please ask an admin of your repository to re-new the access token on this website.
Completed
Push — master ( 2d9641...07f003 )
by Der Mundschenk
12:27 queued 10:23
created

PHP_Typography   B

Complexity

Total Complexity 52

Size/Duplication

Total Lines 426
Duplicated Lines 0 %

Test Coverage

Coverage 100%

Importance

Changes 0
Metric Value
wmc 52
eloc 117
dl 0
loc 426
ccs 126
cts 126
cp 1
rs 7.44
c 0
b 0
f 0

16 Methods

Rating   Name   Duplication   Size   Complexity  
A process() 0 4 1
A process_feed() 0 4 1
A __construct() 0 3 1
A replace_node_with_html() 0 32 5
A parse_html() 0 29 6
A get_html5_parser() 0 9 2
A set_hyphenator_cache() 0 6 2
A get_registry() 0 9 3
A arrays_intersect() 0 8 3
B process_textnodes() 0 48 10
A handle_parsing_errors() 0 7 3
A query_tags_to_ignore() 0 23 6
A get_hyphenator_cache() 0 6 2
A get_hyphenation_languages() 0 2 1
A get_diacritic_languages() 0 2 1
A get_language_plugin_list() 0 33 5

How to fix   Complexity   

Complex Class

Complex classes like PHP_Typography often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

While breaking up the class, it is a good idea to analyze how other classes use PHP_Typography, and based on these observations, apply Extract Interface, too.

1
<?php
2
/**
3
 *  This file is part of PHP-Typography.
4
 *
5
 *  Copyright 2014-2018 Peter Putzer.
6
 *  Copyright 2009-2011 KINGdesk, LLC.
7
 *
8
 *  This program is free software; you can redistribute it and/or modify
9
 *  it under the terms of the GNU General Public License as published by
10
 *  the Free Software Foundation; either version 2 of the License, or
11
 *  (at your option) any later version.
12
 *
13
 *  This program is distributed in the hope that it will be useful,
14
 *  but WITHOUT ANY WARRANTY; without even the implied warranty of
15
 *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
16
 *  GNU General Public License for more details.
17
 *
18
 *  You should have received a copy of the GNU General Public License along
19
 *  with this program; if not, write to the Free Software Foundation, Inc.,
20
 *  51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
21
 *
22
 *  ***
23
 *
24
 *  @package mundschenk-at/php-typography
25
 *  @license http://www.gnu.org/licenses/gpl-2.0.html
26
 */
27
28
namespace PHP_Typography;
29
30
use PHP_Typography\Fixes\Registry;
31
use PHP_Typography\Fixes\Default_Registry;
32
33
/**
34
 * Parses HTML5 (or plain text) and applies various typographic fixes to the text.
35
 *
36
 * If used with multibyte language, UTF-8 encoding is required.
37
 *
38
 * Portions of this code have been inspired by:
39
 *  - typogrify (https://code.google.com/p/typogrify/)
40
 *  - WordPress code for wptexturize (https://developer.wordpress.org/reference/functions/wptexturize/)
41
 *  - PHP SmartyPants Typographer (https://michelf.ca/projects/php-smartypants/typographer/)
42
 *
43
 *  @author Jeffrey D. King <[email protected]>
44
 *  @author Peter Putzer <[email protected]>
45
 */
46
class PHP_Typography {
47
48
	/**
49
	 * A DOM-based HTML5 parser.
50
	 *
51
	 * @var \Masterminds\HTML5
52
	 */
53
	private $html5_parser;
54
55
	/**
56
	 * The hyphenator cache.
57
	 *
58
	 * @var Hyphenator\Cache
59
	 */
60
	protected $hyphenator_cache;
61
62
	/**
63
	 * The node fixes registry.
64
	 *
65
	 * @var Registry|null;
66
	 */
67
	private $registry;
68
69
	/**
70
	 * Whether the Hyphenator\Cache of the $registry needs to be updated.
71
	 *
72
	 * @var bool
73
	 */
74
	private $update_registry_cache;
75
76
	/**
77
	 * Sets up a new PHP_Typography object.
78
	 *
79
	 * @param Registry|null $registry Optional. A fix registry instance. Default null,
80
	 *                                meaning the default fixes are used.
81
	 */
82 1
	public function __construct( Registry $registry = null ) {
83 1
		$this->registry              = $registry;
84 1
		$this->update_registry_cache = ! empty( $registry );
85 1
	}
86
87
	/**
88
	 * Modifies $html according to the defined settings.
89
	 *
90
	 * @since 6.0.0 Parameter $body_classes added.
91
	 *
92
	 * @param string   $html         A HTML fragment.
93
	 * @param Settings $settings     A settings object.
94
	 * @param bool     $is_title     Optional. If the HTML fragment is a title. Default false.
95
	 * @param string[] $body_classes Optional. CSS classes added to the virtual
96
	 *                               <body> element used for processing. Default [].
97
	 *
98
	 * @return string The processed $html.
99
	 */
100 35
	public function process( $html, Settings $settings, $is_title = false, array $body_classes = [] ) {
101
		return $this->process_textnodes( $html, function( $html, $settings, $is_title ) {
102 28
			$this->get_registry()->apply_fixes( $html, $settings, $is_title, false );
103 35
		}, $settings, $is_title, $body_classes );
104
	}
105
106
	/**
107
	 * Modifies $html according to the defined settings, in a way that is appropriate for RSS feeds
108
	 * (i.e. excluding processes that may not display well with limited character set intelligence).
109
	 *
110
	 * @since 6.0.0 Parameter $body_classes added.
111
	 *
112
	 * @param string   $html         A HTML fragment.
113
	 * @param Settings $settings     A settings object.
114
	 * @param bool     $is_title     Optional. If the HTML fragment is a title. Default false.
115
	 * @param string[] $body_classes Optional. CSS classes added to the virtual
116
	 *                               <body> element used for processing. Default [].
117
	 *
118
	 * @return string The processed $html.
119
	 */
120 35
	public function process_feed( $html, Settings $settings, $is_title = false, array $body_classes = [] ) {
121
		return $this->process_textnodes( $html, function( $html, $settings, $is_title ) {
122 28
			$this->get_registry()->apply_fixes( $html, $settings, $is_title, true );
123 35
		}, $settings, $is_title, $body_classes );
124
	}
125
126
	/**
127
	 * Applies specific fixes to all textnodes of the HTML fragment.
128
	 *
129
	 * @since 6.0.0 Parameter $body_classes added.
130
	 *
131
	 * @param string   $html         A HTML fragment.
132
	 * @param callable $fixer        A callback that applies typography fixes to a single textnode.
133
	 * @param Settings $settings     A settings object.
134
	 * @param bool     $is_title     Optional. If the HTML fragment is a title. Default false.
135
	 * @param string[] $body_classes Optional. CSS classes added to the virtual
136
	 *                               <body> element used for processing. Default [].
137
	 *
138
	 * @return string The processed $html.
139
	 */
140 68
	public function process_textnodes( $html, callable $fixer, Settings $settings, $is_title = false, array $body_classes = [] ) {
141 68
		if ( isset( $settings['ignoreTags'] ) && $is_title && ( \in_array( 'h1', /** Array. @scrutinizer ignore-type */ $settings['ignoreTags'], true ) || \in_array( 'h2', /** Array. @scrutinizer ignore-type */ $settings['ignoreTags'], true ) ) ) {
142 33
			return $html;
143
		}
144
145
		// Lazy-load our parser (the text parser is not needed for feeds).
146 35
		$html5_parser = $this->get_html5_parser();
147
148
		// Parse the HTML.
149 35
		$dom = $this->parse_html( $html5_parser, $html, $settings, $body_classes );
150
151
		// Abort if there were parsing errors.
152 35
		if ( empty( $dom ) ) {
153 2
			return $html;
154
		}
155
156
		// Query some nodes in the DOM.
157 33
		$xpath          = new \DOMXPath( $dom );
158 33
		$body_node      = $xpath->query( '/html/body' )->item( 0 );
159 33
		$tags_to_ignore = $this->query_tags_to_ignore( $xpath, $body_node, $settings );
160
161
		// Start processing.
162 33
		foreach ( $xpath->query( '//text()', $body_node ) as $textnode ) {
163
			if (
164
				// One of the ancestors should be ignored.
165 30
				self::arrays_intersect( DOM::get_ancestors( $textnode ), $tags_to_ignore ) ||
166
				// The node contains only whitespace.
167 30
				$textnode->isWhitespaceInElementContent()
168
			) {
169 3
				continue;
170
			}
171
172
			// Store original content.
173 27
			$original = $textnode->data;
174
175
			// Apply fixes.
176 27
			$fixer( $textnode, $settings, $is_title );
177
178
			// Until now, we've only been working on a textnode: HTMLify result.
179 27
			$new = $textnode->data;
180
181
			// Replace original node (if anthing was changed).
182 27
			if ( $new !== $original ) {
183 27
				$this->replace_node_with_html( $textnode, $new );
184
			}
185
		}
186
187 33
		return $html5_parser->saveHTML( $body_node->childNodes );
188
	}
189
190
	/**
191
	 * Determines whether two object arrays intersect. The second array is expected
192
	 * to use the spl_object_hash for its keys.
193
	 *
194
	 * @param array $array1 The keys are ignored.
195
	 * @param array $array2 This array has to be in the form ( $spl_object_hash => $object ).
196
	 *
197
	 * @return boolean
198
	 */
199 4
	protected static function arrays_intersect( array $array1, array $array2 ) {
200 4
		foreach ( $array1 as $value ) {
201 2
			if ( isset( $array2[ \spl_object_hash( $value ) ] ) ) {
202 2
				return true;
203
			}
204
		}
205
206 3
		return false;
207
	}
208
209
	/**
210
	 * Parse HTML5 fragment while ignoring certain warnings for invalid HTML code (e.g. duplicate IDs).
211
	 *
212
	 * @since 6.0.0 Parameter $body_classes added.
213
	 *
214
	 * @param \Masterminds\HTML5 $parser       An intialized parser object.
215
	 * @param string             $html         The HTML fragment to parse (not a complete document).
216
	 * @param Settings           $settings     The settings to apply.
217
	 * @param string[]           $body_classes Optional. CSS classes added to the virtual
218
	 *                                         <body> element used for processing. Default [].
219
	 *
220
	 * @return \DOMDocument|null The encoding has already been set to UTF-8. Returns null if there were parsing errors.
221
	 */
222 69
	public function parse_html( \Masterminds\HTML5 $parser, $html, Settings $settings, array $body_classes = [] ) {
223
		// Silence some parsing errors for invalid HTML.
224 69
		\set_error_handler( [ $this, 'handle_parsing_errors' ] ); // @codingStandardsIgnoreLine
225 69
		$xml_error_handling = \libxml_use_internal_errors( true );
226
227
		// Inject <body> classes.
228 69
		$body = empty( $body_classes ) ? 'body' : 'body class="' . \implode( ' ', $body_classes ) . '"';
229
230
		// Do the actual parsing.
231 69
		$dom           = $parser->loadHTML( "<!DOCTYPE html><html><{$body}>{$html}</body></html>" );
232 69
		$dom->encoding = 'UTF-8';
233
234
		// Restore original error handling.
235 69
		\libxml_clear_errors();
236 69
		\libxml_use_internal_errors( $xml_error_handling );
237 69
		\restore_error_handler();
238
239
		// Handle any parser errors.
240 69
		$errors = $parser->getErrors();
241 69
		if ( ! empty( $settings['parserErrorsHandler'] ) && ! empty( $errors ) ) {
242 2
			$errors = $settings['parserErrorsHandler']( $errors );
243
		}
244
245
		// Return null if there are still unhandled parsing errors.
246 69
		if ( ! empty( $errors ) && ! $settings['parserErrorsIgnore'] ) {
247 2
			$dom = null;
248
		}
249
250 69
		return $dom;
251
	}
252
253
	/**
254
	 * Silently handle certain HTML parsing errors.
255
	 *
256
	 * @since 6.0.0 Unused parameters $errline and $errcontext removed.
257
	 *
258
	 * @param int    $errno      Error number.
259
	 * @param string $errstr     Error message.
260
	 * @param string $errfile    The file in which the error occurred.
261
	 *
262
	 * @return boolean Returns true if the error was handled, false otherwise.
263
	 */
264 4
	public function handle_parsing_errors( $errno, $errstr, $errfile ) {
265 4
		if ( ! ( \error_reporting() & $errno ) ) { // @codingStandardsIgnoreLine.
266 4
			return true; // not interesting.
267
		}
268
269
		// Ignore warnings from parser & let PHP handle the rest.
270 4
		return $errno & E_USER_WARNING && 0 === \substr_compare( $errfile, 'DOMTreeBuilder.php', -18 );
271
	}
272
273
	/**
274
	 * Retrieves an array of nodes that should be skipped during processing.
275
	 *
276
	 * @param \DOMXPath $xpath        A valid XPath instance for the DOM to be queried.
277
	 * @param \DOMNode  $initial_node The starting node of the XPath query.
278
	 * @param Settings  $settings     The settings to apply.
279
	 *
280
	 * @return \DOMNode[] An array of \DOMNode (can be empty).
281
	 */
282 1
	public function query_tags_to_ignore( \DOMXPath $xpath, \DOMNode $initial_node, Settings $settings ) {
283 1
		$elements    = [];
284 1
		$query_parts = [];
285 1
		if ( ! empty( $settings['ignoreTags'] ) ) {
286 1
			$query_parts[] = '//' . \implode( ' | //', /** Array. @scrutinizer ignore-type */ $settings['ignoreTags'] );
287
		}
288 1
		if ( ! empty( $settings['ignoreClasses'] ) ) {
289 1
			$query_parts[] = "//*[contains(concat(' ', @class, ' '), ' " . \implode( " ') or contains(concat(' ', @class, ' '), ' ", /** Array. @scrutinizer ignore-type */ $settings['ignoreClasses'] ) . " ')]";
290
		}
291 1
		if ( ! empty( $settings['ignoreIDs'] ) ) {
292 1
			$query_parts[] = '//*[@id=\'' . \implode( '\' or @id=\'', /** Array. @scrutinizer ignore-type */ $settings['ignoreIDs'] ) . '\']';
293
		}
294
295 1
		if ( ! empty( $query_parts ) ) {
296 1
			$ignore_query = \implode( ' | ', $query_parts );
297
298 1
			$nodelist = $xpath->query( $ignore_query, $initial_node );
299 1
			if ( false !== $nodelist ) {
300 1
				$elements = DOM::nodelist_to_array( $nodelist );
301
			}
302
		}
303
304 1
		return $elements;
305
	}
306
307
	/**
308
	 * Replaces the given node with HTML content. Uses the HTML5 parser.
309
	 *
310
	 * @param \DOMNode $node    The node to replace.
311
	 * @param string   $content The HTML fragment used to replace the node.
312
	 *
313
	 * @return \DOMNode|array An array of \DOMNode containing the new nodes or the old \DOMNode if the replacement failed.
314
	 */
315 2
	public function replace_node_with_html( \DOMNode $node, $content ) {
316 2
		$result = $node;
317
318 2
		$parent = $node->parentNode;
319 2
		if ( empty( $parent ) ) {
320 1
			return $node; // abort early to save cycles.
321
		}
322
323
		// Encode bare < > & and decode escaped HTML tag.
324 1
		$content = RE::unescape_tags( htmlspecialchars( $content, ENT_NOQUOTES | ENT_HTML5, 'UTF-8', false ) );
325
326 1
		\set_error_handler( [ $this, 'handle_parsing_errors' ] ); // @codingStandardsIgnoreLine.
327
328 1
		$html_fragment = $this->get_html5_parser()->loadHTMLFragment( $content );
329 1
		if ( ! empty( $html_fragment ) ) {
330 1
			$imported_fragment = $node->ownerDocument->importNode( $html_fragment, true );
331
332 1
			if ( ! empty( $imported_fragment ) ) {
333
				// Save the children of the imported DOMDocumentFragment before replacement.
334 1
				$children = DOM::nodelist_to_array( $imported_fragment->childNodes );
335
336 1
				if ( false !== $parent->replaceChild( $imported_fragment, $node ) ) {
337
					// Success! We return the saved array of DOMNodes as
338
					// $imported_fragment is just an empty DOMDocumentFragment now.
339 1
					$result = $children;
340
				}
341
			}
342
		}
343
344 1
		\restore_error_handler();
345
346 1
		return $result;
347
	}
348
349
	/**
350
	 * Retrieves the fix registry.
351
	 *
352
	 * @return Registry
353
	 */
354 2
	public function get_registry() {
355 2
		if ( ! isset( $this->registry ) ) {
356 1
			$this->registry = new Default_Registry( $this->get_hyphenator_cache() );
357 1
		} elseif ( $this->update_registry_cache ) {
358 1
			$this->registry->update_hyphenator_cache( $this->get_hyphenator_cache() );
0 ignored issues
show
Bug introduced by
The method update_hyphenator_cache() does not exist on null. ( Ignorable by Annotation )

If this is a false-positive, you can also ignore this issue in your code via the ignore-call  annotation

358
			$this->registry->/** @scrutinizer ignore-call */ 
359
                    update_hyphenator_cache( $this->get_hyphenator_cache() );

This check looks for calls to methods that do not seem to exist on a given type. It looks for the method on the type itself as well as in inherited classes or implemented interfaces.

This is most likely a typographical error or the method has been renamed.

Loading history...
359 1
			$this->update_registry_cache = false;
360
		}
361
362 2
		return $this->registry;
363
	}
364
365
	/**
366
	 * Retrieves the HTML5 parser instance.
367
	 *
368
	 * @return \Masterminds\HTML5
369
	 */
370 1
	public function get_html5_parser() {
371
		// Lazy-load HTML5 parser.
372 1
		if ( ! isset( $this->html5_parser ) ) {
373 1
			$this->html5_parser = new \Masterminds\HTML5( [
374 1
				'disable_html_ns' => true,
375
			] );
376
		}
377
378 1
		return $this->html5_parser;
379
	}
380
381
	/**
382
	 * Retrieves the hyphenator cache.
383
	 *
384
	 * @return Hyphenator\Cache
385
	 */
386 1
	public function get_hyphenator_cache() {
387 1
		if ( ! isset( $this->hyphenator_cache ) ) {
388 1
			$this->hyphenator_cache = new Hyphenator\Cache();
389
		}
390
391 1
		return $this->hyphenator_cache;
392
	}
393
394
	/**
395
	 * Injects an existing Hyphenator\Cache (to facilitate persistent language caching).
396
	 *
397
	 * @param Hyphenator\Cache $cache A hyphenator cache instance.
398
	 */
399 2
	public function set_hyphenator_cache( Hyphenator\Cache $cache ) {
400 2
		$this->hyphenator_cache = $cache;
401
402
		// Change hyphenator cache for existing token fixes.
403 2
		if ( isset( $this->registry ) ) {
404 1
			$this->registry->update_hyphenator_cache( $cache );
405
		}
406 2
	}
407
408
	/**
409
	 * Retrieves the list of valid language plugins in the given directory.
410
	 *
411
	 * @param string $path The path in which to look for language plugin files.
412
	 *
413
	 * @return string[] An array in the form ( $language_code => $language_name ).
414
	 */
415 3
	private static function get_language_plugin_list( $path ) {
416 3
		$languages = [];
417
418
		// Try to open the given directory.
419 3
		$handle = \opendir( $path );
420 2
		if ( false === $handle ) {
421
			// Abort.
422
			return $languages; // @codeCoverageIgnore
423
		}
424
425
		// Read all files in directory.
426 2
		$file = \readdir( $handle );
427 2
		while ( $file ) {
428
			// We only want the JSON files.
429 2
			if ( '.json' === \substr( $file, -5 ) ) {
430 2
				$file_content = \file_get_contents( $path . $file );
431 2
				if ( \preg_match( '/"language"\s*:\s*((".+")|(\'.+\'))\s*,/', $file_content, $matches ) ) {
432 2
					$language_name = \substr( $matches[1], 1, -1 );
433 2
					$language_code = \substr( $file, 0, -5 );
434
435 2
					$languages[ $language_code ] = $language_name;
436
				}
437
			}
438
439
			// Read next file.
440 2
			$file = \readdir( $handle );
441
		}
442 2
		\closedir( $handle );
443
444
		// Sort translated language names according to current locale.
445 2
		\asort( $languages );
446
447 2
		return $languages;
448
	}
449
450
	/**
451
	 * Retrieves the list of valid hyphenation languages.
452
	 *
453
	 * Note that this method reads all the language files on disc, so you should
454
	 * cache the results if possible.
455
	 *
456
	 * @return string[] An array in the form of ( LANG_CODE => LANGUAGE ).
457
	 */
458 1
	public static function get_hyphenation_languages() {
459 1
		return self::get_language_plugin_list( __DIR__ . '/lang/' );
460
	}
461
462
	/**
463
	 * Retrieves the list of valid diacritic replacement languages.
464
	 *
465
	 * Note that this method reads all the language files on disc, so you should
466
	 * cache the results if possible.
467
	 *
468
	 * @return string[] An array in the form of ( LANG_CODE => LANGUAGE ).
469
	 */
470 1
	public static function get_diacritic_languages() {
471 1
		return self::get_language_plugin_list( __DIR__ . '/diacritics/' );
472
	}
473
}
474