PHP_Typography - Code Metrics - Inspection of "Reduce coupling" - mundschenk-at/php-typography - Measure and Improve Code Quality continuously with Scrutinizer

Passed

Pull Request — master (#44)

by Der Mundschenk

created 2017-12-12 21:44 UTC

PHP_Typography C

↳ Parent: Project

Complexity

Total Complexity

Size/Duplication

Total Lines	425
Duplicated Lines	10.12 %

Coupling/Cohesion

Components	2
Dependencies	4

Importance

Changes

Metric	Value
wmc	56
lcom	2
cbo	4
dl	43
loc	425
rs	6.5957
c	0
b	0
f	0

18 Methods

Rating	Name	Duplication	Size	Complexity
A	__construct()	0	4	1
A	process()	0	3	1
A	process_feed()	0	3	1
D	process_textnodes()	0	45	9
A	arrays_intersect()	0	9	3
A	apply_fixes_to_html_node()	7	7	3
A	apply_fixes_to_feed_node()	9	9	4
B	parse_html()	0	27	5
A	handle_parsing_errors()	0	8	3
B	query_tags_to_ignore()	17	24	6
B	replace_node_with_html()	10	30	5
A	get_registry()	0	10	3
A	get_html5_parser()	0	10	2
A	get_hyphenator_cache()	0	7	2
A	set_hyphenator_cache()	0	8	2
B	get_language_plugin_list()	0	29	4
A	get_hyphenation_languages()	0	3	1
A	get_diacritic_languages()	0	3	1

How to fix Duplicated Code Complexity

Duplicated Code

Duplicate code is one of the most pungent code smells. A rule that is often used is to re-structure code once it is duplicated in three or more places.

Common duplication problems, and corresponding solutions are:

If you have the same expression in different places: Extract expression to a method
If you have the same method in different sub-classes: Extract method, and pull up field to the parent class
If you have the same code in unrelated classes: Consider extracting the code to a new class, and injecting that class

Complex Class

Tip: Before tackling complexity, make sure that you eliminate any duplication first. This often can reduce the size of classes significantly.

Complex classes like PHP_Typography often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes. You can also have a look at the cohesion graph to spot any un-connected, or weakly-connected components.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

While breaking up the class, it is a good idea to analyze how other classes use PHP_Typography, and based on these observations, apply Extract Interface, too.

<?php
/**
 *  This file is part of PHP-Typography.
 *
 *  Copyright 2014-2017 Peter Putzer.
 *  Copyright 2009-2011 KINGdesk, LLC.
 *
 *  This program is free software; you can redistribute it and/or modify
 *  it under the terms of the GNU General Public License as published by
 *  the Free Software Foundation; either version 2 of the License, or
 *  (at your option) any later version.
 *
 *  This program is distributed in the hope that it will be useful,
 *  but WITHOUT ANY WARRANTY; without even the implied warranty of
 *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 *  GNU General Public License for more details.
 *
 *  You should have received a copy of the GNU General Public License along
 *  with this program; if not, write to the Free Software Foundation, Inc.,
 *  51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
 *
 *  ***
 *
 *  @package mundschenk-at/php-typography
 *  @license http://www.gnu.org/licenses/gpl-2.0.html
 */

namespace PHP_Typography;

use PHP_Typography\Fixes\Registry;

/**
 * Parses HTML5 (or plain text) and applies various typographic fixes to the text.
 *
 * If used with multibyte language, UTF-8 encoding is required.
 *
 * Portions of this code have been inspired by:
 *  - typogrify (https://code.google.com/p/typogrify/)
 *  - WordPress code for wptexturize (https://developer.wordpress.org/reference/functions/wptexturize/)
 *  - PHP SmartyPants Typographer (https://michelf.ca/projects/php-smartypants/typographer/)
 *
 *  @author Jeffrey D. King <[email protected]>
 *  @author Peter Putzer <[email protected]>
 */
class PHP_Typography {

	/**
	 * A DOM-based HTML5 parser.
	 *
	 * @var \Masterminds\HTML5
	 */
	private $html5_parser;

	/**
	 * The hyphenator cache.
	 *
	 * @var Hyphenator\Cache
	 */
	protected $hyphenator_cache;

	/**
	 * The node fixes registry.
	 *
	 * @var Registry;
	 */
	private $registry;

	/**
	 * Whether the Hyphenator\Cache of the $registry needs to be updated.
	 *
	 * @var bool
	 */
	private $update_registry_cache;

	/**
	 * Sets up a new PHP_Typography object.
	 *
	 * @param Registry|null $registry Optional. A fix registry instance. Default null,
	 *                                meaning the default fixes are used.
	 */
	public function __construct( Registry $registry = null ) {
		$this->registry              = $registry;
		$this->update_registry_cache = ! empty( $registry );
	}

	/**
	 * Modifies $html according to the defined settings.
	 *
	 * @param string   $html      A HTML fragment.
	 * @param Settings $settings  A settings object.
	 * @param bool     $is_title  Optional. If the HTML fragment is a title. Default false.
	 *
	 * @return string The processed $html.
	 */
	public function process( $html, Settings $settings, $is_title = false ) {
		return $this->process_textnodes( $html, [ $this, 'apply_fixes_to_html_node' ], $settings, $is_title );
	}

	/**
	 * Modifies $html according to the defined settings, in a way that is appropriate for RSS feeds
	 * (i.e. excluding processes that may not display well with limited character set intelligence).
	 *
	 * @param string   $html     A HTML fragment.
	 * @param Settings $settings  A settings object.
	 * @param bool     $is_title Optional. If the HTML fragment is a title. Default false.
	 *
	 * @return string The processed $html.
	 */
	public function process_feed( $html, Settings $settings, $is_title = false ) {
		return $this->process_textnodes( $html, [ $this, 'apply_fixes_to_feed_node' ], $settings, $is_title );
	}

	/**
	 * Applies specific fixes to all textnodes of the HTML fragment.
	 *
	 * @param string   $html     A HTML fragment.
	 * @param callable $fixer    A callback that applies typography fixes to a single textnode.
	 * @param Settings $settings  A settings object.
	 * @param bool     $is_title Optional. If the HTML fragment is a title. Default false.
	 *
	 * @return string The processed $html.
	 */
	public function process_textnodes( $html, callable $fixer, Settings $settings, $is_title = false ) {
		if ( isset( $settings['ignoreTags'] ) && $is_title && ( in_array( 'h1', $settings['ignoreTags'], true ) || in_array( 'h2', $settings['ignoreTags'], true ) ) ) {
			return $html;
		}

		// Lazy-load our parser (the text parser is not needed for feeds).
		$html5_parser = $this->get_html5_parser();

		// Parse the HTML.
		$dom = $this->parse_html( $html5_parser, $html, $settings );

		// Abort if there were parsing errors.
		if ( empty( $dom ) ) {
			return $html;
		}

		// Query some nodes in the DOM.
		$xpath          = new \DOMXPath( $dom );
		$body_node      = $xpath->query( '/html/body' )->item( 0 );
		$all_textnodes  = $xpath->query( '//text()', $body_node );
		$tags_to_ignore = $this->query_tags_to_ignore( $xpath, $body_node, $settings );

		// Start processing.
		foreach ( $all_textnodes as $textnode ) {
			if ( self::arrays_intersect( DOM::get_ancestors( $textnode ), $tags_to_ignore ) ) {
				continue;
			}

			// We won't be doing anything with spaces, so we can jump ship if that is all we have.
			if ( $textnode->isWhitespaceInElementContent() ) {
				continue;
			}

			// Decode all characters except < > &.
			$textnode->data = htmlspecialchars( $textnode->data, ENT_NOQUOTES, 'UTF-8' ); // returns < > & to encoded HTML characters (&lt; &gt; and &amp; respectively).

			// Apply fixes.
			call_user_func( $fixer, $textnode, $settings, $is_title );

			// Until now, we've only been working on a textnode: HTMLify result.
			$this->replace_node_with_html( $textnode, $textnode->data );
		}

		return $html5_parser->saveHTML( $body_node->childNodes );
	}

	/**
	 * Determines whether two object arrays intersect. The second array is expected
	 * to use the spl_object_hash for its keys.
	 *
	 * @param array $array1 The keys are ignored.
	 * @param array $array2 This array has to be in the form ( $spl_object_hash => $object ).
	 *
	 * @return boolean
	 */
	protected static function arrays_intersect( array $array1, array $array2 ) {
		foreach ( $array1 as $value ) {
			if ( isset( $array2[ spl_object_hash( $value ) ] ) ) {
				return true;
			}
		}

		return false;
	}

	/**
	 * Applies standard typography fixes to a textnode.
	 *
	 * @param \DOMText $textnode The node to process.
	 * @param Settings $settings The settings to apply.
	 * @param bool     $is_title Optional. Default false.
	 */
	protected function apply_fixes_to_html_node( \DOMText $textnode, Settings $settings, $is_title = false ) {

		foreach ( $this->get_registry()->get_node_fixes() as $group => $fixes ) {
			foreach ( $fixes as $fix ) {
				$fix->apply( $textnode, $settings, $is_title );
			}
		}
	}

	/**
	 * Applies typography fixes specific to RSS feeds to a textnode.
	 *
	 * @param \DOMText $textnode The node to process.
	 * @param Settings $settings The settings to apply.
	 * @param bool     $is_title Optional. Default false.
	 */
	protected function apply_fixes_to_feed_node( \DOMText $textnode, Settings $settings, $is_title = false ) {

		foreach ( $this->get_registry()->get_node_fixes() as $group => $fixes ) {
			foreach ( $fixes as $fix ) {
				if ( $fix->feed_compatible() ) {
					$fix->apply( $textnode, $settings, $is_title );
				}
			}
		}
	}

	/**
	 * Parse HTML5 fragment while ignoring certain warnings for invalid HTML code (e.g. duplicate IDs).
	 *
	 * @param \Masterminds\HTML5 $parser   An intialized parser object.
	 * @param string             $html     The HTML fragment to parse (not a complete document).
	 * @param Settings           $settings The settings to apply.
	 *
	 * @return \DOMDocument|null The encoding has already been set to UTF-8. Returns null if there were parsing errors.
	 */
	public function parse_html( \Masterminds\HTML5 $parser, $html, Settings $settings ) {
		// Silence some parsing errors for invalid HTML.
		set_error_handler( [ $this, 'handle_parsing_errors' ] ); // @codingStandardsIgnoreLine
		$xml_error_handling = libxml_use_internal_errors( true );

		// Do the actual parsing.
		$dom           = $parser->loadHTML( '<!DOCTYPE html><html><body>' . $html . '</body></html>' );
		$dom->encoding = 'UTF-8';

		// Restore original error handling.
		libxml_clear_errors();
		libxml_use_internal_errors( $xml_error_handling );
		restore_error_handler();

		// Handle any parser errors.
		$errors = $parser->getErrors();
		if ( ! empty( $settings['parserErrorsHandler'] ) && ! empty( $errors ) ) {
			$errors = call_user_func( $settings['parserErrorsHandler'], $errors );
		}

		// Return null if there are still unhandled parsing errors.
		if ( ! empty( $errors ) && ! $settings['parserErrorsIgnore'] ) {
			$dom = null;
		}

		return $dom;
	}

	/**
	 * Silently handle certain HTML parsing errors.
	 *
	 * @param int    $errno      Error number.
	 * @param string $errstr     Error message.
	 * @param string $errfile    The file in which the error occurred.
	 * @param int    $errline    The line in which the error occurred.
	 * @param array  $errcontext Calling context.
	 *
	 * @return boolean Returns true if the error was handled, false otherwise.
	 */
	public function handle_parsing_errors( $errno, $errstr, $errfile, $errline, array $errcontext ) {

		if ( ! ( error_reporting() & $errno ) ) { // @codingStandardsIgnoreLine.
			return true; // not interesting.
		}

		// Ignore warnings from parser & let PHP handle the rest.
		return $errno & E_USER_WARNING && 0 === substr_compare( $errfile, 'DOMTreeBuilder.php', -18 );
	}

	/**
	 * Retrieves an array of nodes that should be skipped during processing.
	 *
	 * @param \DOMXPath $xpath        A valid XPath instance for the DOM to be queried.
	 * @param \DOMNode  $initial_node The starting node of the XPath query.
	 * @param Settings  $settings     The settings to apply.
	 *
	 * @return \DOMNode[] An array of \DOMNode (can be empty).
	 */
	public function query_tags_to_ignore( \DOMXPath $xpath, \DOMNode $initial_node, Settings $settings ) {
		$elements    = [];
		$query_parts = [];
		if ( ! empty( $settings['ignoreTags'] ) ) {
			$query_parts[] = '//' . implode( ' | //', $settings['ignoreTags'] );
		}
		if ( ! empty( $settings['ignoreClasses'] ) ) {
			$query_parts[] = "//*[contains(concat(' ', @class, ' '), ' " . implode( " ') or contains(concat(' ', @class, ' '), ' ", $settings['ignoreClasses'] ) . " ')]";
		}
		if ( ! empty( $settings['ignoreIDs'] ) ) {
			$query_parts[] = '//*[@id=\'' . implode( '\' or @id=\'', $settings['ignoreIDs'] ) . '\']';
		}

		if ( ! empty( $query_parts ) ) {
			$ignore_query = implode( ' | ', $query_parts );

			$nodelist = $xpath->query( $ignore_query, $initial_node );
			if ( false !== $nodelist ) {
				$elements = DOM::nodelist_to_array( $nodelist );
			}
		}

		return $elements;
	}

	/**
	 * Replaces the given node with HTML content. Uses the HTML5 parser.
	 *
	 * @param \DOMNode $node    The node to replace.
	 * @param string   $content The HTML fragment used to replace the node.
	 *
	 * @return \DOMNode|array An array of \DOMNode containing the new nodes or the old \DOMNode if the replacement failed.
	 */
	public function replace_node_with_html( \DOMNode $node, $content ) {
		$result = $node;

		$parent = $node->parentNode;
		if ( empty( $parent ) ) {
			return $node; // abort early to save cycles.
		}

		set_error_handler( [ $this, 'handle_parsing_errors' ] ); // @codingStandardsIgnoreLine.

		$html_fragment = $this->get_html5_parser()->loadHTMLFragment( $content );
		if ( ! empty( $html_fragment ) ) {
			$imported_fragment = $node->ownerDocument->importNode( $html_fragment, true );

			if ( ! empty( $imported_fragment ) ) {
				// Save the children of the imported DOMDocumentFragment before replacement.
				$children = DOM::nodelist_to_array( $imported_fragment->childNodes );

				if ( false !== $parent->replaceChild( $imported_fragment, $node ) ) {
					// Success! We return the saved array of DOMNodes as
					// $imported_fragment is just an empty DOMDocumentFragment now.
					$result = $children;
				}
			}
		}

		restore_error_handler();

		return $result;
	}

	/**
	 * Retrieves the fix registry.
	 *
	 * @return Registry
	 */
	public function get_registry() {
		if ( ! isset( $this->registry ) ) {
			$this->registry = Registry::create( $this->get_hyphenator_cache() );
		} elseif ( $this->update_registry_cache ) {
			$this->registry->update_hyphenator_cache( $this->get_hyphenator_cache() );
			$this->update_registry_cache = false;
		}

		return $this->registry;
	}

	/**
	 * Retrieves the HTML5 parser instance.
	 *
	 * @return \Masterminds\HTML5
	 */
	public function get_html5_parser() {
		// Lazy-load HTML5 parser.
		if ( ! isset( $this->html5_parser ) ) {
			$this->html5_parser = new \Masterminds\HTML5( [
				'disable_html_ns' => true,
			] );
		}

		return $this->html5_parser;
	}

	/**
	 * Retrieves the hyphenator cache.
	 *
	 * @return Hyphenator\Cache
	 */
	public function get_hyphenator_cache() {
		if ( ! isset( $this->hyphenator_cache ) ) {
			$this->hyphenator_cache = new Hyphenator\Cache();
		}

		return $this->hyphenator_cache;
	}

	/**
	 * Injects an existing Hyphenator\Cache (to facilitate persistent language caching).
	 *
	 * @param Hyphenator\Cache $cache A hyphenator cache instance.
	 */
	public function set_hyphenator_cache( Hyphenator\Cache $cache ) {
		$this->hyphenator_cache = $cache;

		// Change hyphenator cache for existing token fixes.
		if ( isset( $this->registry ) ) {
			$this->registry->update_hyphenator_cache( $cache );
		}
	}

	/**
	 * Retrieves the list of valid language plugins in the given directory.
	 *
	 * @param string $path The path in which to look for language plugin files.
	 *
	 * @return string[] An array in the form ( $language_code => $language_name ).
	 */
	private static function get_language_plugin_list( $path ) {
		$language_name_pattern = '/"language"\s*:\s*((".+")|(\'.+\'))\s*,/';
		$languages             = [];
		$handle                = opendir( $path );

		// Read all files in directory.
		$file = readdir( $handle );
		while ( $file ) {
			// We only want the JSON files.
			if ( '.json' === substr( $file, -5 ) ) {
				$file_content = file_get_contents( $path . $file );
				if ( preg_match( $language_name_pattern, $file_content, $matches ) ) {
					$language_name = substr( $matches[1], 1, -1 );
					$language_code = substr( $file, 0, -5 );

					$languages[ $language_code ] = $language_name;
				}
			}

			// Read next file.
			$file = readdir( $handle );
		}
		closedir( $handle );

		// Sort translated language names according to current locale.
		asort( $languages );

		return $languages;
	}

	/**
	 * Retrieves the list of valid hyphenation languages.
	 *
	 * Note that this method reads all the language files on disc, so you should
	 * cache the results if possible.
	 *
	 * @return string[] An array in the form of ( LANG_CODE => LANGUAGE ).
	 */
	public static function get_hyphenation_languages() {
		return self::get_language_plugin_list( __DIR__ . '/lang/' );
	}

	/**
	 * Retrieves the list of valid diacritic replacement languages.
	 *
	 * Note that this method reads all the language files on disc, so you should
	 * cache the results if possible.
	 *
	 * @return string[] An array in the form of ( LANG_CODE => LANGUAGE ).
	 */
	public static function get_diacritic_languages() {
		return self::get_language_plugin_list( __DIR__ . '/diacritics/' );
	}
}


Scrutinizer GitHub App not installed

GitHub Access Token became invalid

Pull Request — master (#44)

PHP_Typography C

Complexity

Size/Duplication

Coupling/Cohesion

Importance

18 Methods

How to fix Duplicated Code Complexity

Duplicated Code

Complex Class

1		<?php
2		/**
3		* This file is part of PHP-Typography.
4		*
5		* Copyright 2014-2017 Peter Putzer.
6		* Copyright 2009-2011 KINGdesk, LLC.
7		*
8		* This program is free software; you can redistribute it and/or modify
9		* it under the terms of the GNU General Public License as published by
10		* the Free Software Foundation; either version 2 of the License, or
11		* (at your option) any later version.
12		*
13		* This program is distributed in the hope that it will be useful,
14		* but WITHOUT ANY WARRANTY; without even the implied warranty of
15		* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
16		* GNU General Public License for more details.
17		*
18		* You should have received a copy of the GNU General Public License along
19		* with this program; if not, write to the Free Software Foundation, Inc.,
20		* 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
21		*
22		* ***
23		*
24		* @package mundschenk-at/php-typography
25		* @license http://www.gnu.org/licenses/gpl-2.0.html
26		*/
27
28		namespace PHP_Typography;
29
30		use PHP_Typography\Fixes\Registry;
31
32		/**
33		* Parses HTML5 (or plain text) and applies various typographic fixes to the text.
34		*
35		* If used with multibyte language, UTF-8 encoding is required.
36		*
37		* Portions of this code have been inspired by:
38		* - typogrify (https://code.google.com/p/typogrify/)
39		* - WordPress code for wptexturize (https://developer.wordpress.org/reference/functions/wptexturize/)
40		* - PHP SmartyPants Typographer (https://michelf.ca/projects/php-smartypants/typographer/)
41		*
42		* @author Jeffrey D. King <[email protected]>
43		* @author Peter Putzer <[email protected]>
44		*/
45		class PHP_Typography {
46
47		/**
48		* A DOM-based HTML5 parser.
49		*
50		* @var \Masterminds\HTML5
51		*/
52		private $html5_parser;
53
54		/**
55		* The hyphenator cache.
56		*
57		* @var Hyphenator\Cache
58		*/
59		protected $hyphenator_cache;
60
61		/**
62		* The node fixes registry.
63		*
64		* @var Registry;
65		*/
66		private $registry;
67
68		/**
69		* Whether the Hyphenator\Cache of the $registry needs to be updated.
70		*
71		* @var bool
72		*/
73		private $update_registry_cache;
74
75		/**
76		* Sets up a new PHP_Typography object.
77		*
78		* @param Registry\|null $registry Optional. A fix registry instance. Default null,
79		* meaning the default fixes are used.
80		*/
81		public function __construct( Registry $registry = null ) {
82		$this->registry = $registry;
83		$this->update_registry_cache = ! empty( $registry );
84		}
85
86		/**
87		* Modifies $html according to the defined settings.
88		*
89		* @param string $html A HTML fragment.
90		* @param Settings $settings A settings object.
91		* @param bool $is_title Optional. If the HTML fragment is a title. Default false.
92		*
93		* @return string The processed $html.
94		*/
95		public function process( $html, Settings $settings, $is_title = false ) {
96		return $this->process_textnodes( $html, [ $this, 'apply_fixes_to_html_node' ], $settings, $is_title );
97		}
98
99		/**
100		* Modifies $html according to the defined settings, in a way that is appropriate for RSS feeds
101		* (i.e. excluding processes that may not display well with limited character set intelligence).
102		*
103		* @param string $html A HTML fragment.
104		* @param Settings $settings A settings object.
105		* @param bool $is_title Optional. If the HTML fragment is a title. Default false.
106		*
107		* @return string The processed $html.
108		*/
109		public function process_feed( $html, Settings $settings, $is_title = false ) {
110		return $this->process_textnodes( $html, [ $this, 'apply_fixes_to_feed_node' ], $settings, $is_title );
111		}
112
113		/**
114		* Applies specific fixes to all textnodes of the HTML fragment.
115		*
116		* @param string $html A HTML fragment.
117		* @param callable $fixer A callback that applies typography fixes to a single textnode.
118		* @param Settings $settings A settings object.
119		* @param bool $is_title Optional. If the HTML fragment is a title. Default false.
120		*
121		* @return string The processed $html.
122		*/
123		public function process_textnodes( $html, callable $fixer, Settings $settings, $is_title = false ) {
124		if ( isset( $settings['ignoreTags'] ) && $is_title && ( in_array( 'h1', $settings['ignoreTags'], true ) \|\| in_array( 'h2', $settings['ignoreTags'], true ) ) ) {
125		return $html;
126		}
127
128		// Lazy-load our parser (the text parser is not needed for feeds).
129		$html5_parser = $this->get_html5_parser();
130
131		// Parse the HTML.
132		$dom = $this->parse_html( $html5_parser, $html, $settings );
133
134		// Abort if there were parsing errors.
135		if ( empty( $dom ) ) {
136		return $html;
137		}
138
139		// Query some nodes in the DOM.
140		$xpath = new \DOMXPath( $dom );
141		$body_node = $xpath->query( '/html/body' )->item( 0 );
142		$all_textnodes = $xpath->query( '//text()', $body_node );
143		$tags_to_ignore = $this->query_tags_to_ignore( $xpath, $body_node, $settings );
144
145		// Start processing.
146		foreach ( $all_textnodes as $textnode ) {
147		if ( self::arrays_intersect( DOM::get_ancestors( $textnode ), $tags_to_ignore ) ) {
148		continue;
149		}
150
151		// We won't be doing anything with spaces, so we can jump ship if that is all we have.
152		if ( $textnode->isWhitespaceInElementContent() ) {
153		continue;
154		}
155
156		// Decode all characters except < > &.
157		$textnode->data = htmlspecialchars( $textnode->data, ENT_NOQUOTES, 'UTF-8' ); // returns < > & to encoded HTML characters (< > and & respectively).
158
159		// Apply fixes.
160		call_user_func( $fixer, $textnode, $settings, $is_title );
161
162		// Until now, we've only been working on a textnode: HTMLify result.
163		$this->replace_node_with_html( $textnode, $textnode->data );
164		}
165
166		return $html5_parser->saveHTML( $body_node->childNodes );
167		}
168
169		/**
170		* Determines whether two object arrays intersect. The second array is expected
171		* to use the spl_object_hash for its keys.
172		*
173		* @param array $array1 The keys are ignored.
174		* @param array $array2 This array has to be in the form ( $spl_object_hash => $object ).
175		*
176		* @return boolean
177		*/
178		protected static function arrays_intersect( array $array1, array $array2 ) {
179		foreach ( $array1 as $value ) {
180		if ( isset( $array2[ spl_object_hash( $value ) ] ) ) {
181		return true;
182		}
183		}
184
185		return false;
186		}
187
188		/**
189		* Applies standard typography fixes to a textnode.
190		*
191		* @param \DOMText $textnode The node to process.
192		* @param Settings $settings The settings to apply.
193		* @param bool $is_title Optional. Default false.
194		*/
195	View Code Duplication	protected function apply_fixes_to_html_node( \DOMText $textnode, Settings $settings, $is_title = false ) {
		0 ignored issues – show Duplication introduced 2017-12-12 21:06 UTC by Report Bug Copy Issue Report This method seems to be duplicated in your project. Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation. You can also find more detailed suggestions in the “Code” section of your repository. Loading history...
196		foreach ( $this->get_registry()->get_node_fixes() as $group => $fixes ) {
197		foreach ( $fixes as $fix ) {
198		$fix->apply( $textnode, $settings, $is_title );
199		}
200		}
201		}
202
203		/**
204		* Applies typography fixes specific to RSS feeds to a textnode.
205		*
206		* @param \DOMText $textnode The node to process.
207		* @param Settings $settings The settings to apply.
208		* @param bool $is_title Optional. Default false.
209		*/
210	View Code Duplication	protected function apply_fixes_to_feed_node( \DOMText $textnode, Settings $settings, $is_title = false ) {
		0 ignored issues – show Duplication introduced 2017-12-12 21:06 UTC by Report Bug Copy Issue Report This method seems to be duplicated in your project. Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation. You can also find more detailed suggestions in the “Code” section of your repository. Loading history...
211		foreach ( $this->get_registry()->get_node_fixes() as $group => $fixes ) {
212		foreach ( $fixes as $fix ) {
213		if ( $fix->feed_compatible() ) {
214		$fix->apply( $textnode, $settings, $is_title );
215		}
216		}
217		}
218		}
219
220		/**
221		* Parse HTML5 fragment while ignoring certain warnings for invalid HTML code (e.g. duplicate IDs).
222		*
223		* @param \Masterminds\HTML5 $parser An intialized parser object.
224		* @param string $html The HTML fragment to parse (not a complete document).
225		* @param Settings $settings The settings to apply.
226		*
227		* @return \DOMDocument\|null The encoding has already been set to UTF-8. Returns null if there were parsing errors.
228		*/
229		public function parse_html( \Masterminds\HTML5 $parser, $html, Settings $settings ) {
230		// Silence some parsing errors for invalid HTML.
231		set_error_handler( [ $this, 'handle_parsing_errors' ] ); // @codingStandardsIgnoreLine
232		$xml_error_handling = libxml_use_internal_errors( true );
233
234		// Do the actual parsing.
235		$dom = $parser->loadHTML( '<!DOCTYPE html><html><body>' . $html . '</body></html>' );
236		$dom->encoding = 'UTF-8';
237
238		// Restore original error handling.
239		libxml_clear_errors();
240		libxml_use_internal_errors( $xml_error_handling );
241		restore_error_handler();
242
243		// Handle any parser errors.
244		$errors = $parser->getErrors();
245		if ( ! empty( $settings['parserErrorsHandler'] ) && ! empty( $errors ) ) {
246		$errors = call_user_func( $settings['parserErrorsHandler'], $errors );
247		}
248
249		// Return null if there are still unhandled parsing errors.
250		if ( ! empty( $errors ) && ! $settings['parserErrorsIgnore'] ) {
251		$dom = null;
252		}
253
254		return $dom;
255		}
256
257		/**
258		* Silently handle certain HTML parsing errors.
259		*
260		* @param int $errno Error number.
261		* @param string $errstr Error message.
262		* @param string $errfile The file in which the error occurred.
263		* @param int $errline The line in which the error occurred.
264		* @param array $errcontext Calling context.
265		*
266		* @return boolean Returns true if the error was handled, false otherwise.
267		*/
268		public function handle_parsing_errors( $errno, $errstr, $errfile, $errline, array $errcontext ) {
		0 ignored issues – show Unused Code introduced 2017-08-07 06:10 UTC by Report Bug Copy Issue Report The parameter `$errline` is not used and could be removed. This check looks from parameters that have been defined for a function or method, but which are not used in the method body. Loading history... Unused Code introduced 2017-08-07 06:10 UTC by Report Bug Copy Issue Report The parameter `$errcontext` is not used and could be removed. This check looks from parameters that have been defined for a function or method, but which are not used in the method body. Loading history...
269		if ( ! ( error_reporting() & $errno ) ) { // @codingStandardsIgnoreLine.
270		return true; // not interesting.
271		}
272
273		// Ignore warnings from parser & let PHP handle the rest.
274		return $errno & E_USER_WARNING && 0 === substr_compare( $errfile, 'DOMTreeBuilder.php', -18 );
275		}
276
277		/**
278		* Retrieves an array of nodes that should be skipped during processing.
279		*
280		* @param \DOMXPath $xpath A valid XPath instance for the DOM to be queried.
281		* @param \DOMNode $initial_node The starting node of the XPath query.
282		* @param Settings $settings The settings to apply.
283		*
284		* @return \DOMNode[] An array of \DOMNode (can be empty).
285		*/
286		public function query_tags_to_ignore( \DOMXPath $xpath, \DOMNode $initial_node, Settings $settings ) {
287		$elements = [];
288		$query_parts = [];
289	View Code Duplication	if ( ! empty( $settings['ignoreTags'] ) ) {
290		$query_parts[] = '//' . implode( ' \| //', $settings['ignoreTags'] );
291		}
292	View Code Duplication	if ( ! empty( $settings['ignoreClasses'] ) ) {
293		$query_parts[] = "//*[contains(concat(' ', @class, ' '), ' " . implode( " ') or contains(concat(' ', @class, ' '), ' ", $settings['ignoreClasses'] ) . " ')]";
294		}
295	View Code Duplication	if ( ! empty( $settings['ignoreIDs'] ) ) {
296		$query_parts[] = '//*[@id=\'' . implode( '\' or @id=\'', $settings['ignoreIDs'] ) . '\']';
297		}
298
299	View Code Duplication	if ( ! empty( $query_parts ) ) {
300		$ignore_query = implode( ' \| ', $query_parts );
301
302		$nodelist = $xpath->query( $ignore_query, $initial_node );
303		if ( false !== $nodelist ) {
304		$elements = DOM::nodelist_to_array( $nodelist );
305		}
306		}
307
308		return $elements;
309		}
310
311		/**
312		* Replaces the given node with HTML content. Uses the HTML5 parser.
313		*
314		* @param \DOMNode $node The node to replace.
315		* @param string $content The HTML fragment used to replace the node.
316		*
317		* @return \DOMNode\|array An array of \DOMNode containing the new nodes or the old \DOMNode if the replacement failed.
318		*/
319		public function replace_node_with_html( \DOMNode $node, $content ) {
320		$result = $node;
321
322		$parent = $node->parentNode;
323		if ( empty( $parent ) ) {
324		return $node; // abort early to save cycles.
325		}
326
327		set_error_handler( [ $this, 'handle_parsing_errors' ] ); // @codingStandardsIgnoreLine.
328
329		$html_fragment = $this->get_html5_parser()->loadHTMLFragment( $content );
330		if ( ! empty( $html_fragment ) ) {
331		$imported_fragment = $node->ownerDocument->importNode( $html_fragment, true );
332
333	View Code Duplication	if ( ! empty( $imported_fragment ) ) {
334		// Save the children of the imported DOMDocumentFragment before replacement.
335		$children = DOM::nodelist_to_array( $imported_fragment->childNodes );
336
337		if ( false !== $parent->replaceChild( $imported_fragment, $node ) ) {
338		// Success! We return the saved array of DOMNodes as
339		// $imported_fragment is just an empty DOMDocumentFragment now.
340		$result = $children;
341		}
342		}
343		}
344
345		restore_error_handler();
346
347		return $result;
348		}
349
350		/**
351		* Retrieves the fix registry.
352		*
353		* @return Registry
354		*/
355		public function get_registry() {
356		if ( ! isset( $this->registry ) ) {
357		$this->registry = Registry::create( $this->get_hyphenator_cache() );
358		} elseif ( $this->update_registry_cache ) {
359		$this->registry->update_hyphenator_cache( $this->get_hyphenator_cache() );
360		$this->update_registry_cache = false;
361		}
362
363		return $this->registry;
364		}
365
366		/**
367		* Retrieves the HTML5 parser instance.
368		*
369		* @return \Masterminds\HTML5
370		*/
371		public function get_html5_parser() {
372		// Lazy-load HTML5 parser.
373		if ( ! isset( $this->html5_parser ) ) {
374		$this->html5_parser = new \Masterminds\HTML5( [
375		'disable_html_ns' => true,
376		] );
377		}
378
379		return $this->html5_parser;
380		}
381
382		/**
383		* Retrieves the hyphenator cache.
384		*
385		* @return Hyphenator\Cache
386		*/
387		public function get_hyphenator_cache() {
388		if ( ! isset( $this->hyphenator_cache ) ) {
389		$this->hyphenator_cache = new Hyphenator\Cache();
390		}
391
392		return $this->hyphenator_cache;
393		}
394
395		/**
396		* Injects an existing Hyphenator\Cache (to facilitate persistent language caching).
397		*
398		* @param Hyphenator\Cache $cache A hyphenator cache instance.
399		*/
400		public function set_hyphenator_cache( Hyphenator\Cache $cache ) {
401		$this->hyphenator_cache = $cache;
402
403		// Change hyphenator cache for existing token fixes.
404		if ( isset( $this->registry ) ) {
405		$this->registry->update_hyphenator_cache( $cache );
406		}
407		}
408
409		/**
410		* Retrieves the list of valid language plugins in the given directory.
411		*
412		* @param string $path The path in which to look for language plugin files.
413		*
414		* @return string[] An array in the form ( $language_code => $language_name ).
415		*/
416		private static function get_language_plugin_list( $path ) {
417		$language_name_pattern = '/"language"\s:\s((".+")\|(\'.+\'))\s*,/';
418		$languages = [];
419		$handle = opendir( $path );
420
421		// Read all files in directory.
422		$file = readdir( $handle );
423		while ( $file ) {
424		// We only want the JSON files.
425		if ( '.json' === substr( $file, -5 ) ) {
426		$file_content = file_get_contents( $path . $file );
427		if ( preg_match( $language_name_pattern, $file_content, $matches ) ) {
428		$language_name = substr( $matches[1], 1, -1 );
429		$language_code = substr( $file, 0, -5 );
430
431		$languages[ $language_code ] = $language_name;
432		}
433		}
434
435		// Read next file.
436		$file = readdir( $handle );
437		}
438		closedir( $handle );
439
440		// Sort translated language names according to current locale.
441		asort( $languages );
442
443		return $languages;
444		}
445
446		/**
447		* Retrieves the list of valid hyphenation languages.
448		*
449		* Note that this method reads all the language files on disc, so you should
450		* cache the results if possible.
451		*
452		* @return string[] An array in the form of ( LANG_CODE => LANGUAGE ).
453		*/
454		public static function get_hyphenation_languages() {
455		return self::get_language_plugin_list( __DIR__ . '/lang/' );
456		}
457
458		/**
459		* Retrieves the list of valid diacritic replacement languages.
460		*
461		* Note that this method reads all the language files on disc, so you should
462		* cache the results if possible.
463		*
464		* @return string[] An array in the form of ( LANG_CODE => LANGUAGE ).
465		*/
466		public static function get_diacritic_languages() {
467		return self::get_language_plugin_list( __DIR__ . '/diacritics/' );
468		}
469		}
470

mundschenk-at / php-typography

Scrutinizer GitHub App not installed

GitHub Access Token became invalid

Pull Request — master (#44)

PHP_Typography C

Complexity

Size/Duplication

Coupling/Cohesion

Importance

18 Methods

How to fix Duplicated Code Complexity

Duplicated Code

Complex Class

Duplication Side-by-Side

Filter issues like