Complex classes like HTMLPurifier_Lexer_DOMLex often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes. You can also have a look at the cohesion graph to spot any un-connected, or weakly-connected components.
Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.
While breaking up the class, it is a good idea to analyze how other classes use HTMLPurifier_Lexer_DOMLex, and based on these observations, apply Extract Interface, too.
1 | <?php |
||
27 | class HTMLPurifier_Lexer_DOMLex extends HTMLPurifier_Lexer |
||
|
|||
28 | { |
||
29 | |||
30 | /** |
||
31 | * @type HTMLPurifier_TokenFactory |
||
32 | */ |
||
33 | private $factory; |
||
34 | |||
35 | public function __construct() |
||
41 | |||
42 | /** |
||
43 | * @param string $html |
||
44 | * @param HTMLPurifier_Config $config |
||
45 | * @param HTMLPurifier_Context $context |
||
46 | * @return HTMLPurifier_Token[] |
||
47 | */ |
||
48 | public function tokenizeHTML($html, $config, $context) |
||
83 | |||
84 | /** |
||
85 | * Iterative function that tokenizes a node, putting it into an accumulator. |
||
86 | * To iterate is human, to recurse divine - L. Peter Deutsch |
||
87 | * @param DOMNode $node DOMNode to be tokenized. |
||
88 | * @param HTMLPurifier_Token[] $tokens Array-list of already tokenized tokens. |
||
89 | * @return HTMLPurifier_Token of node appended to previously passed tokens. |
||
90 | */ |
||
91 | protected function tokenizeDOM($node, &$tokens) |
||
120 | |||
121 | /** |
||
122 | * @param DOMNode $node DOMNode to be tokenized. |
||
123 | * @param HTMLPurifier_Token[] $tokens Array-list of already tokenized tokens. |
||
124 | * @param bool $collect Says whether or start and close are collected, set to |
||
125 | * false at first recursion because it's the implicit DIV |
||
126 | * tag you're dealing with. |
||
127 | * @return bool if the token needs an endtoken |
||
128 | * @todo data and tagName properties don't seem to exist in DOMNode? |
||
129 | */ |
||
130 | protected function createStartNode($node, &$tokens, $collect) |
||
185 | |||
186 | /** |
||
187 | * @param DOMNode $node |
||
188 | * @param HTMLPurifier_Token[] $tokens |
||
189 | */ |
||
190 | protected function createEndNode($node, &$tokens) |
||
194 | |||
195 | |||
196 | /** |
||
197 | * Converts a DOMNamedNodeMap of DOMAttr objects into an assoc array. |
||
198 | * |
||
199 | * @param DOMNamedNodeMap $node_map DOMNamedNodeMap of DOMAttr objects. |
||
200 | * @return array Associative array of attributes. |
||
201 | */ |
||
202 | protected function transformAttrToAssoc($node_map) |
||
216 | |||
217 | /** |
||
218 | * An error handler that mutes all errors |
||
219 | * @param int $errno |
||
220 | * @param string $errstr |
||
221 | */ |
||
222 | public function muteErrorHandler($errno, $errstr) |
||
225 | |||
226 | /** |
||
227 | * Callback function for undoing escaping of stray angled brackets |
||
228 | * in comments |
||
229 | * @param array $matches |
||
230 | * @return string |
||
231 | */ |
||
232 | public function callbackUndoCommentSubst($matches) |
||
236 | |||
237 | /** |
||
238 | * Callback function that entity-izes ampersands in comments so that |
||
239 | * callbackUndoCommentSubst doesn't clobber them |
||
240 | * @param array $matches |
||
241 | * @return string |
||
242 | */ |
||
243 | public function callbackArmorCommentEntities($matches) |
||
247 | |||
248 | /** |
||
249 | * Wraps an HTML fragment in the necessary HTML |
||
250 | * @param string $html |
||
251 | * @param HTMLPurifier_Config $config |
||
252 | * @param HTMLPurifier_Context $context |
||
253 | * @return string |
||
254 | */ |
||
255 | protected function wrapHTML($html, $config, $context) |
||
277 | } |
||
278 | |||
280 |
You can fix this by adding a namespace to your class:
When choosing a vendor namespace, try to pick something that is not too generic to avoid conflicts with other libraries.