Complex classes like HtmlFormatter often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes. You can also have a look at the cohesion graph to spot any un-connected, or weakly-connected components.
Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.
While breaking up the class, it is a good idea to analyze how other classes use HtmlFormatter, and based on these observations, apply Extract Interface, too.
| 1 | <?php |
||
| 23 | class HtmlFormatter { |
||
| 24 | /** |
||
| 25 | * @var DOMDocument |
||
| 26 | */ |
||
| 27 | private $doc; |
||
| 28 | |||
| 29 | private $html; |
||
| 30 | private $itemsToRemove = []; |
||
| 31 | private $elementsToFlatten = []; |
||
| 32 | protected $removeMedia = false; |
||
| 33 | |||
| 34 | /** |
||
| 35 | * Constructor |
||
| 36 | * |
||
| 37 | * @param string $html Text to process |
||
| 38 | */ |
||
| 39 | public function __construct( $html ) { |
||
| 42 | |||
| 43 | /** |
||
| 44 | * Turns a chunk of HTML into a proper document |
||
| 45 | * @param string $html |
||
| 46 | * @return string |
||
| 47 | */ |
||
| 48 | public static function wrapHTML( $html ) { |
||
| 51 | |||
| 52 | /** |
||
| 53 | * Override this in descendant class to modify HTML after it has been converted from DOM tree |
||
| 54 | * @param string $html HTML to process |
||
| 55 | * @return string Processed HTML |
||
| 56 | */ |
||
| 57 | protected function onHtmlReady( $html ) { |
||
| 60 | |||
| 61 | /** |
||
| 62 | * @return DOMDocument DOM to manipulate |
||
| 63 | */ |
||
| 64 | public function getDoc() { |
||
| 86 | |||
| 87 | /** |
||
| 88 | * Sets whether images/videos/sounds should be removed from output |
||
| 89 | * @param bool $flag |
||
| 90 | */ |
||
| 91 | public function setRemoveMedia( $flag = true ) { |
||
| 94 | |||
| 95 | /** |
||
| 96 | * Adds one or more selector of content to remove. A subset of CSS selector |
||
| 97 | * syntax is supported: |
||
| 98 | * |
||
| 99 | * <tag> |
||
| 100 | * <tag>.class |
||
| 101 | * .<class> |
||
| 102 | * #<id> |
||
| 103 | * |
||
| 104 | * @param array|string $selectors Selector(s) of stuff to remove |
||
| 105 | */ |
||
| 106 | public function remove( $selectors ) { |
||
| 109 | |||
| 110 | /** |
||
| 111 | * Adds one or more element name to the list to flatten (remove tag, but not its content) |
||
| 112 | * Can accept undelimited regexes |
||
| 113 | * |
||
| 114 | * Note this interface may fail in surprising unexpected ways due to usage of regexes, |
||
| 115 | * so should not be relied on for HTML markup security measures. |
||
| 116 | * |
||
| 117 | * @param array|string $elements Name(s) of tag(s) to flatten |
||
| 118 | */ |
||
| 119 | public function flatten( $elements ) { |
||
| 122 | |||
| 123 | /** |
||
| 124 | * Instructs the formatter to flatten all tags |
||
| 125 | */ |
||
| 126 | public function flattenAllTags() { |
||
| 129 | |||
| 130 | /** |
||
| 131 | * Removes content we've chosen to remove. The text of the removed elements can be |
||
| 132 | * extracted with the getText method. |
||
| 133 | * @return array Array of removed DOMElements |
||
| 134 | */ |
||
| 135 | public function filterContent() { |
||
| 205 | |||
| 206 | /** |
||
| 207 | * Removes a list of elelments from DOMDocument |
||
| 208 | * @param array|DOMNodeList $elements |
||
| 209 | * @return array Array of removed elements |
||
| 210 | */ |
||
| 211 | private function removeElements( $elements ) { |
||
| 227 | |||
| 228 | /** |
||
| 229 | * libxml in its usual pointlessness converts many chars to entities - this function |
||
| 230 | * perfoms a reverse conversion |
||
| 231 | * @param string $html |
||
| 232 | * @return string |
||
| 233 | */ |
||
| 234 | private function fixLibXML( $html ) { |
||
| 254 | |||
| 255 | /** |
||
| 256 | * Performs final transformations and returns resulting HTML. Note that if you want to call this |
||
| 257 | * both without an element and with an element you should call it without an element first. If you |
||
| 258 | * specify the $element in the method it'll change the underlying dom and you won't be able to get |
||
| 259 | * it back. |
||
| 260 | * |
||
| 261 | * @param DOMElement|string|null $element ID of element to get HTML from or |
||
| 262 | * false to get it from the whole tree |
||
| 263 | * @return string Processed HTML |
||
| 264 | */ |
||
| 265 | public function getText( $element = null ) { |
||
| 305 | |||
| 306 | /** |
||
| 307 | * Helper function for parseItemsToRemove(). This function extracts the selector type |
||
| 308 | * and the raw name of a selector from a CSS-style selector string and assigns those |
||
| 309 | * values to parameters passed by reference. For example, if given '#toc' as the |
||
| 310 | * $selector parameter, it will assign 'ID' as the $type and 'toc' as the $rawName. |
||
| 311 | * @param string $selector CSS selector to parse |
||
| 312 | * @param string $type The type of selector (ID, CLASS, TAG_CLASS, or TAG) |
||
| 313 | * @param string $rawName The raw name of the selector |
||
| 314 | * @return bool Whether the selector was successfully recognised |
||
| 315 | * @throws MWException |
||
| 316 | */ |
||
| 317 | protected function parseSelector( $selector, &$type, &$rawName ) { |
||
| 336 | |||
| 337 | /** |
||
| 338 | * Transforms CSS-style selectors into an internal representation suitable for |
||
| 339 | * processing by filterContent() |
||
| 340 | * @return array |
||
| 341 | */ |
||
| 342 | protected function parseItemsToRemove() { |
||
| 366 | } |
||
| 367 |
This check marks implicit conversions of arrays to boolean values in a comparison. While in PHP an empty array is considered to be equal (but not identical) to false, this is not always apparent.
Consider making the comparison explicit by using
empty(..)or! empty(...)instead.