GitHub Access Token became invalid

It seems like the GitHub access token used for retrieving details about this repository from GitHub became invalid. This might prevent certain types of inspections from being run (in particular, everything related to pull requests).
Please ask an admin of your repository to re-new the access token on this website.

HTMLPurifier_Lexer_DOMLex::wrapHTML()   B
last analyzed

Complexity

Conditions 5
Paths 5

Size

Total Lines 22
Code Lines 14

Duplication

Lines 0
Ratio 0 %

Importance

Changes 1
Bugs 0 Features 0
Metric Value
cc 5
eloc 14
c 1
b 0
f 0
nc 5
nop 3
dl 0
loc 22
rs 8.6737
1
<?php
2
3
/**
4
 * Parser that uses PHP 5's DOM extension (part of the core).
5
 *
6
 * In PHP 5, the DOM XML extension was revamped into DOM and added to the core.
7
 * It gives us a forgiving HTML parser, which we use to transform the HTML
8
 * into a DOM, and then into the tokens.  It is blazingly fast (for large
9
 * documents, it performs twenty times faster than
10
 * HTMLPurifier_Lexer_DirectLex,and is the default choice for PHP 5.
11
 *
12
 * @note Any empty elements will have empty tokens associated with them, even if
13
 * this is prohibited by the spec. This is cannot be fixed until the spec
14
 * comes into play.
15
 *
16
 * @note PHP's DOM extension does not actually parse any entities, we use
17
 *       our own function to do that.
18
 *
19
 * @warning DOM tends to drop whitespace, which may wreak havoc on indenting.
20
 *          If this is a huge problem, due to the fact that HTML is hand
21
 *          edited and you are unable to get a parser cache that caches the
22
 *          the output of HTML Purifier while keeping the original HTML lying
23
 *          around, you may want to run Tidy on the resulting output or use
24
 *          HTMLPurifier_DirectLex
25
 */
26
27
class HTMLPurifier_Lexer_DOMLex extends HTMLPurifier_Lexer
0 ignored issues
show
Coding Style Compatibility introduced by
PSR1 recommends that each class must be in a namespace of at least one level to avoid collisions.

You can fix this by adding a namespace to your class:

namespace YourVendor;

class YourClass { }

When choosing a vendor namespace, try to pick something that is not too generic to avoid conflicts with other libraries.

Loading history...
28
{
29
30
    /**
31
     * @type HTMLPurifier_TokenFactory
32
     */
33
    private $factory;
34
35
    public function __construct()
36
    {
37
        // setup the factory
38
        parent::__construct();
39
        $this->factory = new HTMLPurifier_TokenFactory();
40
    }
41
42
    /**
43
     * @param string $html
44
     * @param HTMLPurifier_Config $config
45
     * @param HTMLPurifier_Context $context
46
     * @return HTMLPurifier_Token[]
47
     */
48
    public function tokenizeHTML($html, $config, $context)
49
    {
50
        $html = $this->normalize($html, $config, $context);
51
52
        // attempt to armor stray angled brackets that cannot possibly
53
        // form tags and thus are probably being used as emoticons
54
        if ($config->get('Core.AggressivelyFixLt')) {
55
            $char = '[^a-z!\/]';
56
            $comment = "/<!--(.*?)(-->|\z)/is";
57
            $html = preg_replace_callback($comment, array($this, 'callbackArmorCommentEntities'), $html);
58
            do {
59
                $old = $html;
60
                $html = preg_replace("/<($char)/i", '&lt;\\1', $html);
61
            } while ($html !== $old);
62
            $html = preg_replace_callback($comment, array($this, 'callbackUndoCommentSubst'), $html); // fix comments
63
        }
64
65
        // preprocess html, essential for UTF-8
66
        $html = $this->wrapHTML($html, $config, $context);
67
68
        $doc = new DOMDocument();
69
        $doc->encoding = 'UTF-8'; // theoretically, the above has this covered
70
71
        set_error_handler(array($this, 'muteErrorHandler'));
72
        $doc->loadHTML($html);
73
        restore_error_handler();
74
75
        $tokens = array();
76
        $this->tokenizeDOM(
77
            $doc->getElementsByTagName('html')->item(0)-> // <html>
78
            getElementsByTagName('body')->item(0), //   <body>
79
            $tokens
80
        );
81
        return $tokens;
82
    }
83
84
    /**
85
     * Iterative function that tokenizes a node, putting it into an accumulator.
86
     * To iterate is human, to recurse divine - L. Peter Deutsch
87
     * @param DOMNode $node DOMNode to be tokenized.
88
     * @param HTMLPurifier_Token[] $tokens   Array-list of already tokenized tokens.
89
     * @return HTMLPurifier_Token of node appended to previously passed tokens.
90
     */
91
    protected function tokenizeDOM($node, &$tokens)
92
    {
93
        $level = 0;
94
        $nodes = array($level => new HTMLPurifier_Queue(array($node)));
95
        $closingNodes = array();
96
        do {
97
            while (!$nodes[$level]->isEmpty()) {
98
                $node = $nodes[$level]->shift(); // FIFO
99
                $collect = $level > 0 ? true : false;
100
                $needEndingTag = $this->createStartNode($node, $tokens, $collect);
101
                if ($needEndingTag) {
102
                    $closingNodes[$level][] = $node;
103
                }
104
                if ($node->childNodes && $node->childNodes->length) {
105
                    $level++;
106
                    $nodes[$level] = new HTMLPurifier_Queue();
107
                    foreach ($node->childNodes as $childNode) {
108
                        $nodes[$level]->push($childNode);
109
                    }
110
                }
111
            }
112
            $level--;
113
            if ($level && isset($closingNodes[$level])) {
114
                while ($node = array_pop($closingNodes[$level])) {
115
                    $this->createEndNode($node, $tokens);
116
                }
117
            }
118
        } while ($level > 0);
119
    }
120
121
    /**
122
     * @param DOMNode $node DOMNode to be tokenized.
123
     * @param HTMLPurifier_Token[] $tokens   Array-list of already tokenized tokens.
124
     * @param bool $collect  Says whether or start and close are collected, set to
125
     *                    false at first recursion because it's the implicit DIV
126
     *                    tag you're dealing with.
127
     * @return bool if the token needs an endtoken
128
     * @todo data and tagName properties don't seem to exist in DOMNode?
129
     */
130
    protected function createStartNode($node, &$tokens, $collect)
131
    {
132
        // intercept non element nodes. WE MUST catch all of them,
133
        // but we're not getting the character reference nodes because
134
        // those should have been preprocessed
135
        if ($node->nodeType === XML_TEXT_NODE) {
136
            $tokens[] = $this->factory->createText($node->data);
0 ignored issues
show
Bug introduced by
The property data does not seem to exist in DOMNode.

An attempt at access to an undefined property has been detected. This may either be a typographical error or the property has been renamed but there are still references to its old name.

If you really want to allow access to undefined properties, you can define magic methods to allow access. See the php core documentation on Overloading.

Loading history...
137
            return false;
138
        } elseif ($node->nodeType === XML_CDATA_SECTION_NODE) {
139
            // undo libxml's special treatment of <script> and <style> tags
140
            $last = end($tokens);
141
            $data = $node->data;
142
            // (note $node->tagname is already normalized)
143
            if ($last instanceof HTMLPurifier_Token_Start && ($last->name == 'script' || $last->name == 'style')) {
144
                $new_data = trim($data);
145
                if (substr($new_data, 0, 4) === '<!--') {
146
                    $data = substr($new_data, 4);
147
                    if (substr($data, -3) === '-->') {
148
                        $data = substr($data, 0, -3);
149
                    } else {
0 ignored issues
show
Unused Code introduced by
This else statement is empty and can be removed.

This check looks for the else branches of if statements that have no statements or where all statements have been commented out. This may be the result of changes for debugging or the code may simply be obsolete.

These else branches can be removed.

if (rand(1, 6) > 3) {
print "Check failed";
} else {
    //print "Check succeeded";
}

could be turned into

if (rand(1, 6) > 3) {
    print "Check failed";
}

This is much more concise to read.

Loading history...
150
                        // Highly suspicious! Not sure what to do...
151
                    }
152
                }
153
            }
154
            $tokens[] = $this->factory->createText($this->parseData($data));
155
            return false;
156
        } elseif ($node->nodeType === XML_COMMENT_NODE) {
157
            // this is code is only invoked for comments in script/style in versions
158
            // of libxml pre-2.6.28 (regular comments, of course, are still
159
            // handled regularly)
160
            $tokens[] = $this->factory->createComment($node->data);
161
            return false;
162
        } elseif ($node->nodeType !== XML_ELEMENT_NODE) {
163
            // not-well tested: there may be other nodes we have to grab
164
            return false;
165
        }
166
167
        $attr = $node->hasAttributes() ? $this->transformAttrToAssoc($node->attributes) : array();
168
169
        // We still have to make sure that the element actually IS empty
170
        if (!$node->childNodes->length) {
171
            if ($collect) {
172
                $tokens[] = $this->factory->createEmpty($node->tagName, $attr);
0 ignored issues
show
Bug introduced by
The property tagName does not seem to exist in DOMNode.

An attempt at access to an undefined property has been detected. This may either be a typographical error or the property has been renamed but there are still references to its old name.

If you really want to allow access to undefined properties, you can define magic methods to allow access. See the php core documentation on Overloading.

Loading history...
173
            }
174
            return false;
175
        } else {
176
            if ($collect) {
177
                $tokens[] = $this->factory->createStart(
178
                    $tag_name = $node->tagName, // somehow, it get's dropped
179
                    $attr
180
                );
181
            }
182
            return true;
183
        }
184
    }
185
186
    /**
187
     * @param DOMNode $node
188
     * @param HTMLPurifier_Token[] $tokens
189
     */
190
    protected function createEndNode($node, &$tokens)
191
    {
192
        $tokens[] = $this->factory->createEnd($node->tagName);
0 ignored issues
show
Bug introduced by
The property tagName does not seem to exist in DOMNode.

An attempt at access to an undefined property has been detected. This may either be a typographical error or the property has been renamed but there are still references to its old name.

If you really want to allow access to undefined properties, you can define magic methods to allow access. See the php core documentation on Overloading.

Loading history...
193
    }
194
195
196
    /**
197
     * Converts a DOMNamedNodeMap of DOMAttr objects into an assoc array.
198
     *
199
     * @param DOMNamedNodeMap $node_map DOMNamedNodeMap of DOMAttr objects.
200
     * @return array Associative array of attributes.
201
     */
202
    protected function transformAttrToAssoc($node_map)
203
    {
204
        // NamedNodeMap is documented very well, so we're using undocumented
205
        // features, namely, the fact that it implements Iterator and
206
        // has a ->length attribute
207
        if ($node_map->length === 0) {
0 ignored issues
show
Bug introduced by
The property length does not seem to exist in DOMNamedNodeMap.

An attempt at access to an undefined property has been detected. This may either be a typographical error or the property has been renamed but there are still references to its old name.

If you really want to allow access to undefined properties, you can define magic methods to allow access. See the php core documentation on Overloading.

Loading history...
208
            return array();
209
        }
210
        $array = array();
211
        foreach ($node_map as $attr) {
212
            $array[$attr->name] = $attr->value;
213
        }
214
        return $array;
215
    }
216
217
    /**
218
     * An error handler that mutes all errors
219
     * @param int $errno
220
     * @param string $errstr
221
     */
222
    public function muteErrorHandler($errno, $errstr)
0 ignored issues
show
Unused Code introduced by
The parameter $errno is not used and could be removed.

This check looks from parameters that have been defined for a function or method, but which are not used in the method body.

Loading history...
Unused Code introduced by
The parameter $errstr is not used and could be removed.

This check looks from parameters that have been defined for a function or method, but which are not used in the method body.

Loading history...
223
    {
224
    }
225
226
    /**
227
     * Callback function for undoing escaping of stray angled brackets
228
     * in comments
229
     * @param array $matches
230
     * @return string
231
     */
232
    public function callbackUndoCommentSubst($matches)
233
    {
234
        return '<!--' . strtr($matches[1], array('&amp;' => '&', '&lt;' => '<')) . $matches[2];
235
    }
236
237
    /**
238
     * Callback function that entity-izes ampersands in comments so that
239
     * callbackUndoCommentSubst doesn't clobber them
240
     * @param array $matches
241
     * @return string
242
     */
243
    public function callbackArmorCommentEntities($matches)
244
    {
245
        return '<!--' . str_replace('&', '&amp;', $matches[1]) . $matches[2];
246
    }
247
248
    /**
249
     * Wraps an HTML fragment in the necessary HTML
250
     * @param string $html
251
     * @param HTMLPurifier_Config $config
252
     * @param HTMLPurifier_Context $context
253
     * @return string
254
     */
255
    protected function wrapHTML($html, $config, $context)
0 ignored issues
show
Unused Code introduced by
The parameter $context is not used and could be removed.

This check looks from parameters that have been defined for a function or method, but which are not used in the method body.

Loading history...
256
    {
257
        $def = $config->getDefinition('HTML');
258
        $ret = '';
259
260
        if (!empty($def->doctype->dtdPublic) || !empty($def->doctype->dtdSystem)) {
261
            $ret .= '<!DOCTYPE html ';
262
            if (!empty($def->doctype->dtdPublic)) {
263
                $ret .= 'PUBLIC "' . $def->doctype->dtdPublic . '" ';
0 ignored issues
show
Bug introduced by
The property doctype does not seem to exist in HTMLPurifier_Definition.

An attempt at access to an undefined property has been detected. This may either be a typographical error or the property has been renamed but there are still references to its old name.

If you really want to allow access to undefined properties, you can define magic methods to allow access. See the php core documentation on Overloading.

Loading history...
264
            }
265
            if (!empty($def->doctype->dtdSystem)) {
266
                $ret .= '"' . $def->doctype->dtdSystem . '" ';
267
            }
268
            $ret .= '>';
269
        }
270
271
        $ret .= '<html><head>';
272
        $ret .= '<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />';
273
        // No protection if $html contains a stray </div>!
274
        $ret .= '</head><body>' . $html . '</body></html>';
275
        return $ret;
276
    }
277
}
278
279
// vim: et sw=4 sts=4
280