Duplicate code is one of the most pungent code smells. A rule that is often used is to re-structure code once it is duplicated in three or more places.
Common duplication problems, and corresponding solutions are:
Complex classes like HTMLPurifier_Lexer_DOMLex often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes. You can also have a look at the cohesion graph to spot any un-connected, or weakly-connected components.
Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.
While breaking up the class, it is a good idea to analyze how other classes use HTMLPurifier_Lexer_DOMLex, and based on these observations, apply Extract Interface, too.
| 1 | <?php |
||
| 27 | class HTMLPurifier_Lexer_DOMLex extends HTMLPurifier_Lexer |
||
| 28 | { |
||
| 29 | |||
| 30 | /** |
||
| 31 | * @type HTMLPurifier_TokenFactory |
||
| 32 | */ |
||
| 33 | private $factory; |
||
| 34 | |||
| 35 | public function __construct() |
||
| 41 | |||
| 42 | /** |
||
| 43 | * @param string $html |
||
| 44 | * @param HTMLPurifier_Config $config |
||
| 45 | * @param HTMLPurifier_Context $context |
||
| 46 | * @return HTMLPurifier_Token[] |
||
| 47 | */ |
||
| 48 | public function tokenizeHTML($html, $config, $context) |
||
| 91 | |||
| 92 | /** |
||
| 93 | * Iterative function that tokenizes a node, putting it into an accumulator. |
||
| 94 | * To iterate is human, to recurse divine - L. Peter Deutsch |
||
| 95 | * @param DOMNode $node DOMNode to be tokenized. |
||
| 96 | * @param HTMLPurifier_Token[] $tokens Array-list of already tokenized tokens. |
||
| 97 | * @return HTMLPurifier_Token of node appended to previously passed tokens. |
||
|
|
|||
| 98 | */ |
||
| 99 | protected function tokenizeDOM($node, &$tokens, $config) |
||
| 128 | |||
| 129 | /** |
||
| 130 | * @param DOMNode $node DOMNode to be tokenized. |
||
| 131 | * @param HTMLPurifier_Token[] $tokens Array-list of already tokenized tokens. |
||
| 132 | * @param bool $collect Says whether or start and close are collected, set to |
||
| 133 | * false at first recursion because it's the implicit DIV |
||
| 134 | * tag you're dealing with. |
||
| 135 | * @return bool if the token needs an endtoken |
||
| 136 | * @todo data and tagName properties don't seem to exist in DOMNode? |
||
| 137 | */ |
||
| 138 | protected function createStartNode($node, &$tokens, $collect, $config) |
||
| 193 | |||
| 194 | /** |
||
| 195 | * @param DOMNode $node |
||
| 196 | * @param HTMLPurifier_Token[] $tokens |
||
| 197 | */ |
||
| 198 | protected function createEndNode($node, &$tokens) |
||
| 202 | |||
| 203 | |||
| 204 | /** |
||
| 205 | * Converts a DOMNamedNodeMap of DOMAttr objects into an assoc array. |
||
| 206 | * |
||
| 207 | * @param DOMNamedNodeMap $node_map DOMNamedNodeMap of DOMAttr objects. |
||
| 208 | * @return array Associative array of attributes. |
||
| 209 | */ |
||
| 210 | protected function transformAttrToAssoc($node_map) |
||
| 224 | |||
| 225 | /** |
||
| 226 | * An error handler that mutes all errors |
||
| 227 | * @param int $errno |
||
| 228 | * @param string $errstr |
||
| 229 | */ |
||
| 230 | public function muteErrorHandler($errno, $errstr) |
||
| 233 | |||
| 234 | /** |
||
| 235 | * Callback function for undoing escaping of stray angled brackets |
||
| 236 | * in comments |
||
| 237 | * @param array $matches |
||
| 238 | * @return string |
||
| 239 | */ |
||
| 240 | public function callbackUndoCommentSubst($matches) |
||
| 244 | |||
| 245 | /** |
||
| 246 | * Callback function that entity-izes ampersands in comments so that |
||
| 247 | * callbackUndoCommentSubst doesn't clobber them |
||
| 248 | * @param array $matches |
||
| 249 | * @return string |
||
| 250 | */ |
||
| 251 | public function callbackArmorCommentEntities($matches) |
||
| 255 | |||
| 256 | /** |
||
| 257 | * Wraps an HTML fragment in the necessary HTML |
||
| 258 | * @param string $html |
||
| 259 | * @param HTMLPurifier_Config $config |
||
| 260 | * @param HTMLPurifier_Context $context |
||
| 261 | * @return string |
||
| 262 | */ |
||
| 263 | protected function wrapHTML($html, $config, $context, $use_div = true) |
||
| 289 | } |
||
| 290 | |||
| 292 |
This check compares the return type specified in the
@returnannotation of a function or method doc comment with the types returned by the function and raises an issue if they mismatch.