Complex classes like HTMLPurifier_Lexer_DOMLex often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes. You can also have a look at the cohesion graph to spot any un-connected, or weakly-connected components.
Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.
While breaking up the class, it is a good idea to analyze how other classes use HTMLPurifier_Lexer_DOMLex, and based on these observations, apply Extract Interface, too.
| 1 | <?php |
||
| 27 | class HTMLPurifier_Lexer_DOMLex extends HTMLPurifier_Lexer |
||
|
|
|||
| 28 | { |
||
| 29 | |||
| 30 | /** |
||
| 31 | * @type HTMLPurifier_TokenFactory |
||
| 32 | */ |
||
| 33 | private $factory; |
||
| 34 | |||
| 35 | public function __construct() |
||
| 41 | |||
| 42 | /** |
||
| 43 | * @param string $html |
||
| 44 | * @param HTMLPurifier_Config $config |
||
| 45 | * @param HTMLPurifier_Context $context |
||
| 46 | * @return HTMLPurifier_Token[] |
||
| 47 | */ |
||
| 48 | public function tokenizeHTML($html, $config, $context) |
||
| 83 | |||
| 84 | /** |
||
| 85 | * Iterative function that tokenizes a node, putting it into an accumulator. |
||
| 86 | * To iterate is human, to recurse divine - L. Peter Deutsch |
||
| 87 | * @param DOMNode $node DOMNode to be tokenized. |
||
| 88 | * @param HTMLPurifier_Token[] $tokens Array-list of already tokenized tokens. |
||
| 89 | * @return HTMLPurifier_Token of node appended to previously passed tokens. |
||
| 90 | */ |
||
| 91 | protected function tokenizeDOM($node, &$tokens) |
||
| 120 | |||
| 121 | /** |
||
| 122 | * @param DOMNode $node DOMNode to be tokenized. |
||
| 123 | * @param HTMLPurifier_Token[] $tokens Array-list of already tokenized tokens. |
||
| 124 | * @param bool $collect Says whether or start and close are collected, set to |
||
| 125 | * false at first recursion because it's the implicit DIV |
||
| 126 | * tag you're dealing with. |
||
| 127 | * @return bool if the token needs an endtoken |
||
| 128 | * @todo data and tagName properties don't seem to exist in DOMNode? |
||
| 129 | */ |
||
| 130 | protected function createStartNode($node, &$tokens, $collect) |
||
| 185 | |||
| 186 | /** |
||
| 187 | * @param DOMNode $node |
||
| 188 | * @param HTMLPurifier_Token[] $tokens |
||
| 189 | */ |
||
| 190 | protected function createEndNode($node, &$tokens) |
||
| 194 | |||
| 195 | |||
| 196 | /** |
||
| 197 | * Converts a DOMNamedNodeMap of DOMAttr objects into an assoc array. |
||
| 198 | * |
||
| 199 | * @param DOMNamedNodeMap $node_map DOMNamedNodeMap of DOMAttr objects. |
||
| 200 | * @return array Associative array of attributes. |
||
| 201 | */ |
||
| 202 | protected function transformAttrToAssoc($node_map) |
||
| 216 | |||
| 217 | /** |
||
| 218 | * An error handler that mutes all errors |
||
| 219 | * @param int $errno |
||
| 220 | * @param string $errstr |
||
| 221 | */ |
||
| 222 | public function muteErrorHandler($errno, $errstr) |
||
| 225 | |||
| 226 | /** |
||
| 227 | * Callback function for undoing escaping of stray angled brackets |
||
| 228 | * in comments |
||
| 229 | * @param array $matches |
||
| 230 | * @return string |
||
| 231 | */ |
||
| 232 | public function callbackUndoCommentSubst($matches) |
||
| 236 | |||
| 237 | /** |
||
| 238 | * Callback function that entity-izes ampersands in comments so that |
||
| 239 | * callbackUndoCommentSubst doesn't clobber them |
||
| 240 | * @param array $matches |
||
| 241 | * @return string |
||
| 242 | */ |
||
| 243 | public function callbackArmorCommentEntities($matches) |
||
| 247 | |||
| 248 | /** |
||
| 249 | * Wraps an HTML fragment in the necessary HTML |
||
| 250 | * @param string $html |
||
| 251 | * @param HTMLPurifier_Config $config |
||
| 252 | * @param HTMLPurifier_Context $context |
||
| 253 | * @return string |
||
| 254 | */ |
||
| 255 | protected function wrapHTML($html, $config, $context) |
||
| 277 | } |
||
| 278 | |||
| 280 |
You can fix this by adding a namespace to your class:
When choosing a vendor namespace, try to pick something that is not too generic to avoid conflicts with other libraries.