DOMDocumentWrapper::loadMarkupHTML()   F
last analyzed

Complexity

Conditions 20
Paths 8640

Size

Total Lines 89
Code Lines 61

Duplication

Lines 0
Ratio 0 %

Importance

Changes 0
Metric Value
cc 20
eloc 61
nc 8640
nop 2
dl 0
loc 89
rs 2
c 0
b 0
f 0

How to fix   Long Method    Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
<?php
2
namespace PhpQuery\Dom;
3
4
use PhpQuery\PhpQuery;
5
6
/**
7
 * DOMDocumentWrapper class simplifies work with DOMDocument.
8
 *
9
 * Know bug:
10
 * - in XHTML fragments, <br /> changes to <br clear="none" />
11
 *
12
 * @todo    check XML catalogs compatibility
13
 * @author  Tobiasz Cudnik <tobiasz.cudnik/gmail.com>
14
 * @package PhpQuery
15
 */
16
class DOMDocumentWrapper
17
{
18
    /**
19
     * @var \DOMDocument
20
     */
21
    public $document;
22
    public $id;
23
    /**
24
     * @todo Rewrite as method and quess if null.
25
     * @var unknown_type
26
     */
27
    public $contentType = '';
28
    public $xpath;
29
    public $uuid = 0;
30
    public $data = array();
31
    public $dataNodes = array();
32
    public $events = array();
33
    public $eventsNodes = array();
34
    public $eventsGlobal = array();
35
    /**
36
     * @TODO iframes support http://code.google.com/p/phpquery/issues/detail?id=28
37
     * @var unknown_type
38
     */
39
    public $frames = array();
40
    /**
41
     * Document root, by default equals to document itself.
42
     * Used by documentFragments.
43
     *
44
     * @var \DOMNode
45
     */
46
    public $root;
47
    public $isDocumentFragment;
48
    public $isXML = false;
49
    public $isXHTML = false;
50
    public $isHTML = false;
51
    public $charset;
52
53
    public function __construct($markup = null, $contentType = null, $newDocumentID = null)
54
    {
55
        if (isset($markup)) {
56
            $this->load($markup, $contentType, $newDocumentID);
57
        }
58
        $this->id = $newDocumentID ? $newDocumentID : md5(microtime());
59
    }
60
61
    public function load($markup, $contentType = null, $newDocumentID = null)
0 ignored issues
show
Unused Code introduced by
The parameter $newDocumentID is not used and could be removed.

This check looks from parameters that have been defined for a function or method, but which are not used in the method body.

Loading history...
62
    {
63
        //		PhpQuery::$documents[$id] = $this;
0 ignored issues
show
Unused Code Comprehensibility introduced by
59% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
64
        $this->contentType = strtolower($contentType);
0 ignored issues
show
Documentation Bug introduced by
It seems like strtolower($contentType) of type string is incompatible with the declared type object<PhpQuery\Dom\unknown_type> of property $contentType.

Our type inference engine has found an assignment to a property that is incompatible with the declared type of that property.

Either this assignment is in error or the assigned type should be added to the documentation/type hint for that property..

Loading history...
65
        if ($markup instanceof \DOMDocument) {
66
            $this->document = $markup;
67
            $this->root     = $this->document;
68
            $this->charset  = $this->document->encoding;
69
            // TODO isDocumentFragment
70
            $loaded = true;
71
        } else {
72
            $loaded = $this->loadMarkup($markup);
73
        }
74
        if ($loaded) {
75
            //			$this->document->formatOutput = true;
0 ignored issues
show
Unused Code Comprehensibility introduced by
46% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
76
            $this->document->preserveWhiteSpace = true;
77
            $this->xpath                        = new \DOMXPath($this->document);
78
            $this->afterMarkupLoad();
79
            return true;
80
            // remember last loaded document
81
            //			return PhpQuery::selectDocument($id);
0 ignored issues
show
Unused Code Comprehensibility introduced by
60% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
82
        }
83
        return false;
84
    }
85
86
    protected function afterMarkupLoad()
87
    {
88
        if ($this->isXHTML) {
89
            $this->xpath->registerNamespace("html", "http://www.w3.org/1999/xhtml");
90
        }
91
    }
92
93
    protected function loadMarkup($markup)
94
    {
95
        $loaded = false;
96
        if ($this->contentType) {
97
            self::debug("Load markup for content type {$this->contentType}");
98
            // content determined by contentType
99
            list($contentType, $charset) = $this->contentTypeToArray($this->contentType);
100
            switch ($contentType) {
101
                case 'text/html':
102
                    PhpQuery::debug("Loading HTML, content type '{$this->contentType}'");
103
                    $loaded = $this->loadMarkupHTML($markup, $charset);
104
                    break;
105
                case 'text/xml':
106
                case 'application/xhtml+xml':
107
                    PhpQuery::debug("Loading XML, content type '{$this->contentType}'");
108
                    $loaded = $this->loadMarkupXML($markup, $charset);
109
                    break;
110
                default:
111
                    // for feeds or anything that sometimes doesn't use text/xml
112
                    if (strpos('xml', $this->contentType) !== false) {
113
                        PhpQuery::debug("Loading XML, content type '{$this->contentType}'");
114
                        $loaded = $this->loadMarkupXML($markup, $charset);
115
                    } else {
116
                        PhpQuery::debug("Could not determine document type from content type '{$this->contentType}'");
117
                    }
118
            }
119
        } else {
120
            // content type autodetection
121
            if ($this->isXML($markup)) {
122
                PhpQuery::debug("Loading XML, isXML() == true");
123
                $loaded = $this->loadMarkupXML($markup);
124
                if (!$loaded && $this->isXHTML) {
125
                    PhpQuery::debug('Loading as XML failed, trying to load as HTML, isXHTML == true');
126
                    $loaded = $this->loadMarkupHTML($markup);
127
                }
128
            } else {
129
                PhpQuery::debug("Loading HTML, isXML() == false");
130
                $loaded = $this->loadMarkupHTML($markup);
131
            }
132
        }
133
        return $loaded;
134
    }
135
136
    protected function loadMarkupReset()
137
    {
138
        $this->isXML = $this->isXHTML = $this->isHTML = false;
139
    }
140
141
    protected function documentCreate($charset, $version = '1.0')
142
    {
143
        if (!$version) {
144
            $version = '1.0';
145
        }
146
        $this->document = new \DOMDocument($version, $charset);
147
        $this->charset  = $this->document->encoding;
148
        //		$this->document->encoding = $charset;
0 ignored issues
show
Unused Code Comprehensibility introduced by
46% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
149
        $this->document->formatOutput       = true;
150
        $this->document->preserveWhiteSpace = true;
151
    }
152
153
    protected function loadMarkupHTML($markup, $requestedCharset = null)
154
    {
155
        if (PhpQuery::$debug) {
156
            PhpQuery::debug('Full markup load (HTML): ' . substr($markup, 0, 250));
157
        }
158
        $this->loadMarkupReset();
159
        $this->isHTML = true;
160
        if (!isset($this->isDocumentFragment)) {
161
            $this->isDocumentFragment = self::isDocumentFragmentHTML($markup);
162
        }
163
        $charset            = null;
164
        $documentCharset    = $this->charsetFromHTML($markup);
165
        $addDocumentCharset = false;
166
        if ($documentCharset) {
167
            $charset = $documentCharset;
168
            $markup  = $this->charsetFixHTML($markup);
169
        } else {
170
            if ($requestedCharset) {
171
                $charset = $requestedCharset;
172
            }
173
        }
174
        if (!$charset) {
175
            $charset = PhpQuery::$defaultCharset;
176
        }
177
        // HTTP 1.1 says that the default charset is ISO-8859-1
178
        // @see http://www.w3.org/International/O-HTTP-charset
179
        if (!$documentCharset) {
180
            $documentCharset    = 'ISO-8859-1';
181
            $addDocumentCharset = true;
182
        }
183
        // Should be careful here, still need 'magic encoding detection' since lots of pages have other 'default encoding'
184
        // Worse, some pages can have mixed encodings... we'll try not to worry about that
185
        $requestedCharset = strtoupper($requestedCharset);
186
        $documentCharset  = strtoupper($documentCharset);
187
        PhpQuery::debug("DOC: $documentCharset REQ: $requestedCharset");
188
        if ($requestedCharset && $documentCharset
189
            && $requestedCharset !== $documentCharset
190
        ) {
191
            PhpQuery::debug("CHARSET CONVERT");
192
            // Document Encoding Conversion
193
            // http://code.google.com/p/phpquery/issues/detail?id=86
194
            if (function_exists('mb_detect_encoding')) {
195
                $possibleCharsets = array(
196
                    $documentCharset,
197
                    $requestedCharset,
198
                    'AUTO'
199
                );
200
                $docEncoding      = mb_detect_encoding($markup, implode(', ', $possibleCharsets));
201
                if (!$docEncoding) {
202
                    $docEncoding = $documentCharset;
203
                }
204
                // ok trust the document
205
                PhpQuery::debug("DETECTED '$docEncoding'");
206
                // Detected does not match what document says...
207
                if ($docEncoding !== $documentCharset) {
0 ignored issues
show
Unused Code introduced by
This if statement is empty and can be removed.

This check looks for the bodies of if statements that have no statements or where all statements have been commented out. This may be the result of changes for debugging or the code may simply be obsolete.

These if bodies can be removed. If you have an empty if but statements in the else branch, consider inverting the condition.

if (rand(1, 6) > 3) {
//print "Check failed";
} else {
    print "Check succeeded";
}

could be turned into

if (rand(1, 6) <= 3) {
    print "Check succeeded";
}

This is much more concise to read.

Loading history...
208
                    // Tricky..
209
                }
210
                if ($docEncoding !== $requestedCharset) {
211
                    PhpQuery::debug("CONVERT $docEncoding => $requestedCharset");
212
                    $markup  = mb_convert_encoding($markup, $requestedCharset, $docEncoding);
213
                    $markup  = $this->charsetAppendToHTML($markup, $requestedCharset);
214
                    $charset = $requestedCharset;
215
                }
216
            } else {
217
                PhpQuery::debug("TODO: charset conversion without mbstring...");
218
            }
219
        }
220
        $return = false;
0 ignored issues
show
Unused Code introduced by
$return is not used, you could remove the assignment.

This check looks for variable assignements that are either overwritten by other assignments or where the variable is not used subsequently.

$myVar = 'Value';
$higher = false;

if (rand(1, 6) > 3) {
    $higher = true;
} else {
    $higher = false;
}

Both the $myVar assignment in line 1 and the $higher assignment in line 2 are dead. The first because $myVar is never used and the second because $higher is always overwritten for every possible time line.

Loading history...
221
        if ($this->isDocumentFragment) {
222
            PhpQuery::debug("Full markup load (HTML), DocumentFragment detected, using charset '$charset'");
223
            $return = $this->documentFragmentLoadMarkup($this, $charset, $markup);
224
        } else {
225
            if ($addDocumentCharset) {
226
                PhpQuery::debug("Full markup load (HTML), appending charset: '$charset'");
227
                $markup = $this->charsetAppendToHTML($markup, $charset);
228
            }
229
            PhpQuery::debug("Full markup load (HTML), documentCreate('$charset')");
230
            $this->documentCreate($charset);
231
            $return = PhpQuery::$debug === 2 ? $this->document->loadHTML($markup)
232
                : @$this->document->loadHTML($markup);
233
            if ($return) {
234
                $this->root = $this->document;
235
            }
236
        }
237
        if ($return && !$this->contentType) {
238
            $this->contentType = 'text/html';
0 ignored issues
show
Documentation Bug introduced by
It seems like 'text/html' of type string is incompatible with the declared type object<PhpQuery\Dom\unknown_type> of property $contentType.

Our type inference engine has found an assignment to a property that is incompatible with the declared type of that property.

Either this assignment is in error or the assigned type should be added to the documentation/type hint for that property..

Loading history...
239
        }
240
        return $return;
241
    }
242
243
    protected function loadMarkupXML($markup, $requestedCharset = null)
244
    {
245
        if (PhpQuery::$debug) {
246
            PhpQuery::debug('Full markup load (XML): ' . substr($markup, 0, 250));
247
        }
248
        $this->loadMarkupReset();
249
        $this->isXML = true;
250
        // check agains XHTML in contentType or markup
251
        $isContentTypeXHTML = $this->isXHTML();
252
        $isMarkupXHTML      = $this->isXHTML($markup);
253
        if ($isContentTypeXHTML || $isMarkupXHTML) {
254
            self::debug('Full markup load (XML), XHTML detected');
255
            $this->isXHTML = true;
256
        }
257
        // determine document fragment
258
        if (!isset($this->isDocumentFragment)) {
259
            $this->isDocumentFragment = $this->isXHTML ? self::isDocumentFragmentXHTML($markup)
260
                : self::isDocumentFragmentXML($markup);
261
        }
262
        // this charset will be used
263
        $charset = null;
264
        // charset from XML declaration @var string
265
        $documentCharset = $this->charsetFromXML($markup);
266
        if (!$documentCharset) {
0 ignored issues
show
Bug Best Practice introduced by
The expression $documentCharset of type string|null is loosely compared to false; this is ambiguous if the string can be empty. You might want to explicitly use === null instead.

In PHP, under loose comparison (like ==, or !=, or switch conditions), values of different types might be equal.

For string values, the empty string '' is a special case, in particular the following results might be unexpected:

''   == false // true
''   == null  // true
'ab' == false // false
'ab' == null  // false

// It is often better to use strict comparison
'' === false // false
'' === null  // false
Loading history...
267
            if ($this->isXHTML) {
268
                // this is XHTML, try to get charset from content-type meta header
269
                $documentCharset = $this->charsetFromHTML($markup);
270
                if ($documentCharset) {
271
                    PhpQuery::debug("Full markup load (XML), appending XHTML charset '$documentCharset'");
272
                    $this->charsetAppendToXML($markup, $documentCharset);
0 ignored issues
show
Unused Code introduced by
The call to the method PhpQuery\Dom\DOMDocument...r::charsetAppendToXML() seems un-needed as the method has no side-effects.

PHP Analyzer performs a side-effects analysis of your code. A side-effect is basically anything that might be visible after the scope of the method is left.

Let’s take a look at an example:

class User
{
    private $email;

    public function getEmail()
    {
        return $this->email;
    }

    public function setEmail($email)
    {
        $this->email = $email;
    }
}

If we look at the getEmail() method, we can see that it has no side-effect. Whether you call this method or not, no future calls to other methods are affected by this. As such code as the following is useless:

$user = new User();
$user->getEmail(); // This line could safely be removed as it has no effect.

On the hand, if we look at the setEmail(), this method _has_ side-effects. In the following case, we could not remove the method call:

$user = new User();
$user->setEmail('email@domain'); // This line has a side-effect (it changes an
                                 // instance variable).
Loading history...
273
                    $charset = $documentCharset;
274
                }
275
            }
276
            if (!$documentCharset) {
277
                // if still no document charset...
278
                $charset = $requestedCharset;
279
            }
280
        } else {
281
            if ($requestedCharset) {
282
                $charset = $requestedCharset;
283
            }
284
        }
285
        if (!$charset) {
286
            $charset = PhpQuery::$defaultCharset;
287
        }
288
        if ($requestedCharset && $documentCharset
0 ignored issues
show
Unused Code introduced by
This if statement is empty and can be removed.

This check looks for the bodies of if statements that have no statements or where all statements have been commented out. This may be the result of changes for debugging or the code may simply be obsolete.

These if bodies can be removed. If you have an empty if but statements in the else branch, consider inverting the condition.

if (rand(1, 6) > 3) {
//print "Check failed";
} else {
    print "Check succeeded";
}

could be turned into

if (rand(1, 6) <= 3) {
    print "Check succeeded";
}

This is much more concise to read.

Loading history...
289
            && $requestedCharset != $documentCharset
290
        ) {
291
            // TODO place for charset conversion
292
            //			$charset = $requestedCharset;
0 ignored issues
show
Unused Code Comprehensibility introduced by
43% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
293
        }
294
        $return = false;
0 ignored issues
show
Unused Code introduced by
$return is not used, you could remove the assignment.

This check looks for variable assignements that are either overwritten by other assignments or where the variable is not used subsequently.

$myVar = 'Value';
$higher = false;

if (rand(1, 6) > 3) {
    $higher = true;
} else {
    $higher = false;
}

Both the $myVar assignment in line 1 and the $higher assignment in line 2 are dead. The first because $myVar is never used and the second because $higher is always overwritten for every possible time line.

Loading history...
295
        if ($this->isDocumentFragment) {
296
            PhpQuery::debug("Full markup load (XML), DocumentFragment detected, using charset '$charset'");
297
            $return = $this->documentFragmentLoadMarkup($this, $charset, $markup);
298
        } else {
299
            // FIXME ???
0 ignored issues
show
Unused Code Comprehensibility introduced by
50% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
300
            if ($isContentTypeXHTML && !$isMarkupXHTML) {
301
                if (!$documentCharset) {
302
                    PhpQuery::debug("Full markup load (XML), appending charset '$charset'");
303
                    $markup = $this->charsetAppendToXML($markup, $charset);
304
                }
305
            }
306
            // see http://pl2.php.net/manual/en/book.dom.php#78929
307
            // LIBXML_DTDLOAD (>= PHP 5.1)
308
            // does XML ctalogues works with LIBXML_NONET
309
            //		$this->document->resolveExternals = true;
0 ignored issues
show
Unused Code Comprehensibility introduced by
46% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
310
            // TODO test LIBXML_COMPACT for performance improvement
311
            // create document
312
            $this->documentCreate($charset);
313
            if (phpversion() < 5.1) {
314
                $this->document->resolveExternals = true;
315
                $return                           = PhpQuery::$debug === 2 ? $this->document->loadXML($markup)
0 ignored issues
show
Security XML Injection introduced by
$markup can contain request data and is used in xml context(s) leading to a potential security vulnerability.

1 path for user data to reach this point

  1. Read from $_POST, and $_POST['data'] is passed to jQueryServer::__construct()
    in src/Proxy/jQueryServer.php on line 118
  2. Data is passed through trim(), and Data is decoded by json_decode()
    in vendor/src/PhpQuery.php on line 1173
  3. $data is assigned
    in src/Proxy/jQueryServer.php on line 57
  4. jQueryServer::$options is assigned
    in src/Proxy/jQueryServer.php on line 60
  5. Tainted property jQueryServer::$options is read, and $ajax is assigned
    in src/Proxy/jQueryServer.php on line 61
  6. $ajax is assigned
    in src/Proxy/jQueryServer.php on line 63
  7. $ajax is passed to PhpQuery::ajax()
    in src/Proxy/jQueryServer.php on line 67
  8. $options is passed through array_merge(), and $options is assigned
    in src/PhpQuery.php on line 818
  9. $documentID is assigned
    in src/PhpQuery.php on line 819
  10. $documentID is passed to PhpQueryEvents::trigger()
    in src/PhpQuery.php on line 935
  11. $documentID is assigned
    in src/PhpQueryEvents.php on line 30
  12. $documentID is passed to PhpQuery::getDocument()
    in src/PhpQueryEvents.php on line 39
  13. $id is passed to PhpQueryObject::__construct()
    in src/PhpQuery.php on line 318
  14. $id is assigned
    in src/PhpQueryObject.php on line 108
  15. PhpQueryObject::$documentID is assigned
    in src/PhpQueryObject.php on line 115
  16. Tainted property PhpQueryObject::$documentID is read
    in src/PhpQueryObject.php on line 252
  17. PhpQueryObject::getDocumentID() returns tainted data, and $this->getDocumentID() is passed to PhpQuery::pq()
    in src/PhpQueryObject.php on line 1991
  18. $context is passed to PhpQuery::newDocument()
    in src/PhpQuery.php on line 197
  19. $markup is passed to PhpQuery::createDocumentWrapper()
    in src/PhpQuery.php on line 333
  20. $html is passed to DOMDocumentWrapper::__construct()
    in src/PhpQuery.php on line 596
  21. $markup is passed to DOMDocumentWrapper::load()
    in src/Dom/DOMDocumentWrapper.php on line 56
  22. $markup is passed to DOMDocumentWrapper::loadMarkup()
    in src/Dom/DOMDocumentWrapper.php on line 72
  23. $markup is passed to DOMDocumentWrapper::loadMarkupXML()
    in src/Dom/DOMDocumentWrapper.php on line 108

Preventing XML Injection Attacks

If you pass user-data to XML parsing functions like simplexml_load_string(), this can be abused to inject external entities to gain access to the contents of any file in your filesystem, or it can be used to freeze your PHP process with an entity expansion attack.

In order to prevent that, make sure to disable external entity loading and disallow custom doc-types:

libxml_disable_entity_loader(true);

$dom = new DOMDocument;
$dom->loadXML($taintedXml);
foreach ($dom->childNodes as $child) {
    if ($child->nodeType === XML_DOCUMENT_TYPE_NODE) {
        throw new InvalidArgumentException(
            'Invalid XML: Detected use of illegal DOCTYPE'
        );
    }
}

// It is now safe to use $taintedXml
$xml = simplexml_load_string($taintedXml);

General Strategies to prevent injection

In general, it is advisable to prevent any user-data to reach this point. This can be done by white-listing certain values:

if ( ! in_array($value, array('this-is-allowed', 'and-this-too'), true)) {
    throw new \InvalidArgumentException('This input is not allowed.');
}

For numeric data, we recommend to explicitly cast the data:

$sanitized = (integer) $tainted;
Loading history...
316
                    : @$this->document->loadXML($markup);
0 ignored issues
show
Security XML Injection introduced by
$markup can contain request data and is used in xml context(s) leading to a potential security vulnerability.

1 path for user data to reach this point

  1. Read from $_POST, and $_POST['data'] is passed to jQueryServer::__construct()
    in src/Proxy/jQueryServer.php on line 118
  2. Data is passed through trim(), and Data is decoded by json_decode()
    in vendor/src/PhpQuery.php on line 1173
  3. $data is assigned
    in src/Proxy/jQueryServer.php on line 57
  4. jQueryServer::$options is assigned
    in src/Proxy/jQueryServer.php on line 60
  5. Tainted property jQueryServer::$options is read, and $ajax is assigned
    in src/Proxy/jQueryServer.php on line 61
  6. $ajax is assigned
    in src/Proxy/jQueryServer.php on line 63
  7. $ajax is passed to PhpQuery::ajax()
    in src/Proxy/jQueryServer.php on line 67
  8. $options is passed through array_merge(), and $options is assigned
    in src/PhpQuery.php on line 818
  9. $documentID is assigned
    in src/PhpQuery.php on line 819
  10. $documentID is passed to PhpQueryEvents::trigger()
    in src/PhpQuery.php on line 935
  11. $documentID is assigned
    in src/PhpQueryEvents.php on line 30
  12. $documentID is passed to PhpQuery::getDocument()
    in src/PhpQueryEvents.php on line 39
  13. $id is passed to PhpQueryObject::__construct()
    in src/PhpQuery.php on line 318
  14. $id is assigned
    in src/PhpQueryObject.php on line 108
  15. PhpQueryObject::$documentID is assigned
    in src/PhpQueryObject.php on line 115
  16. Tainted property PhpQueryObject::$documentID is read
    in src/PhpQueryObject.php on line 252
  17. PhpQueryObject::getDocumentID() returns tainted data, and $this->getDocumentID() is passed to PhpQuery::pq()
    in src/PhpQueryObject.php on line 1991
  18. $context is passed to PhpQuery::newDocument()
    in src/PhpQuery.php on line 197
  19. $markup is passed to PhpQuery::createDocumentWrapper()
    in src/PhpQuery.php on line 333
  20. $html is passed to DOMDocumentWrapper::__construct()
    in src/PhpQuery.php on line 596
  21. $markup is passed to DOMDocumentWrapper::load()
    in src/Dom/DOMDocumentWrapper.php on line 56
  22. $markup is passed to DOMDocumentWrapper::loadMarkup()
    in src/Dom/DOMDocumentWrapper.php on line 72
  23. $markup is passed to DOMDocumentWrapper::loadMarkupXML()
    in src/Dom/DOMDocumentWrapper.php on line 108

Preventing XML Injection Attacks

If you pass user-data to XML parsing functions like simplexml_load_string(), this can be abused to inject external entities to gain access to the contents of any file in your filesystem, or it can be used to freeze your PHP process with an entity expansion attack.

In order to prevent that, make sure to disable external entity loading and disallow custom doc-types:

libxml_disable_entity_loader(true);

$dom = new DOMDocument;
$dom->loadXML($taintedXml);
foreach ($dom->childNodes as $child) {
    if ($child->nodeType === XML_DOCUMENT_TYPE_NODE) {
        throw new InvalidArgumentException(
            'Invalid XML: Detected use of illegal DOCTYPE'
        );
    }
}

// It is now safe to use $taintedXml
$xml = simplexml_load_string($taintedXml);

General Strategies to prevent injection

In general, it is advisable to prevent any user-data to reach this point. This can be done by white-listing certain values:

if ( ! in_array($value, array('this-is-allowed', 'and-this-too'), true)) {
    throw new \InvalidArgumentException('This input is not allowed.');
}

For numeric data, we recommend to explicitly cast the data:

$sanitized = (integer) $tainted;
Loading history...
317
            } else {
318
                /** @link http://pl2.php.net/manual/en/libxml.constants.php */
319
                $libxmlStatic = PhpQuery::$debug === 2 ? LIBXML_DTDLOAD
320
                    | LIBXML_DTDATTR | LIBXML_NONET
321
                    : LIBXML_DTDLOAD | LIBXML_DTDATTR | LIBXML_NONET | LIBXML_NOWARNING
322
                    | LIBXML_NOERROR;
323
                $return       = $this->document->loadXML($markup, $libxmlStatic);
0 ignored issues
show
Security XML Injection introduced by
$markup can contain request data and is used in xml context(s) leading to a potential security vulnerability.

1 path for user data to reach this point

  1. Read from $_POST, and $_POST['data'] is passed to jQueryServer::__construct()
    in src/Proxy/jQueryServer.php on line 118
  2. Data is passed through trim(), and Data is decoded by json_decode()
    in vendor/src/PhpQuery.php on line 1173
  3. $data is assigned
    in src/Proxy/jQueryServer.php on line 57
  4. jQueryServer::$options is assigned
    in src/Proxy/jQueryServer.php on line 60
  5. Tainted property jQueryServer::$options is read, and $ajax is assigned
    in src/Proxy/jQueryServer.php on line 61
  6. $ajax is assigned
    in src/Proxy/jQueryServer.php on line 63
  7. $ajax is passed to PhpQuery::ajax()
    in src/Proxy/jQueryServer.php on line 67
  8. $options is passed through array_merge(), and $options is assigned
    in src/PhpQuery.php on line 818
  9. $documentID is assigned
    in src/PhpQuery.php on line 819
  10. $documentID is passed to PhpQueryEvents::trigger()
    in src/PhpQuery.php on line 935
  11. $documentID is assigned
    in src/PhpQueryEvents.php on line 30
  12. $documentID is passed to PhpQuery::getDocument()
    in src/PhpQueryEvents.php on line 39
  13. $id is passed to PhpQueryObject::__construct()
    in src/PhpQuery.php on line 318
  14. $id is assigned
    in src/PhpQueryObject.php on line 108
  15. PhpQueryObject::$documentID is assigned
    in src/PhpQueryObject.php on line 115
  16. Tainted property PhpQueryObject::$documentID is read
    in src/PhpQueryObject.php on line 252
  17. PhpQueryObject::getDocumentID() returns tainted data, and $this->getDocumentID() is passed to PhpQuery::pq()
    in src/PhpQueryObject.php on line 1991
  18. $context is passed to PhpQuery::newDocument()
    in src/PhpQuery.php on line 197
  19. $markup is passed to PhpQuery::createDocumentWrapper()
    in src/PhpQuery.php on line 333
  20. $html is passed to DOMDocumentWrapper::__construct()
    in src/PhpQuery.php on line 596
  21. $markup is passed to DOMDocumentWrapper::load()
    in src/Dom/DOMDocumentWrapper.php on line 56
  22. $markup is passed to DOMDocumentWrapper::loadMarkup()
    in src/Dom/DOMDocumentWrapper.php on line 72
  23. $markup is passed to DOMDocumentWrapper::loadMarkupXML()
    in src/Dom/DOMDocumentWrapper.php on line 108

Preventing XML Injection Attacks

If you pass user-data to XML parsing functions like simplexml_load_string(), this can be abused to inject external entities to gain access to the contents of any file in your filesystem, or it can be used to freeze your PHP process with an entity expansion attack.

In order to prevent that, make sure to disable external entity loading and disallow custom doc-types:

libxml_disable_entity_loader(true);

$dom = new DOMDocument;
$dom->loadXML($taintedXml);
foreach ($dom->childNodes as $child) {
    if ($child->nodeType === XML_DOCUMENT_TYPE_NODE) {
        throw new InvalidArgumentException(
            'Invalid XML: Detected use of illegal DOCTYPE'
        );
    }
}

// It is now safe to use $taintedXml
$xml = simplexml_load_string($taintedXml);

General Strategies to prevent injection

In general, it is advisable to prevent any user-data to reach this point. This can be done by white-listing certain values:

if ( ! in_array($value, array('this-is-allowed', 'and-this-too'), true)) {
    throw new \InvalidArgumentException('This input is not allowed.');
}

For numeric data, we recommend to explicitly cast the data:

$sanitized = (integer) $tainted;
Loading history...
324
                // 				if (! $return)
0 ignored issues
show
Unused Code Comprehensibility introduced by
63% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
325
                // 					$return = $this->document->loadHTML($markup);
0 ignored issues
show
Unused Code Comprehensibility introduced by
58% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
326
            }
327
            if ($return) {
328
                $this->root = $this->document;
329
            }
330
        }
331
        if ($return) {
332
            if (!$this->contentType) {
333
                if ($this->isXHTML) {
334
                    $this->contentType = 'application/xhtml+xml';
0 ignored issues
show
Documentation Bug introduced by
It seems like 'application/xhtml+xml' of type string is incompatible with the declared type object<PhpQuery\Dom\unknown_type> of property $contentType.

Our type inference engine has found an assignment to a property that is incompatible with the declared type of that property.

Either this assignment is in error or the assigned type should be added to the documentation/type hint for that property..

Loading history...
335
                } else {
336
                    $this->contentType = 'text/xml';
0 ignored issues
show
Documentation Bug introduced by
It seems like 'text/xml' of type string is incompatible with the declared type object<PhpQuery\Dom\unknown_type> of property $contentType.

Our type inference engine has found an assignment to a property that is incompatible with the declared type of that property.

Either this assignment is in error or the assigned type should be added to the documentation/type hint for that property..

Loading history...
337
                }
338
            }
339
            return $return;
340
        } else {
341
            throw new \Exception("Error loading XML markup");
342
        }
343
    }
344
345
    protected function isXHTML($markup = null)
346
    {
347
        if (!isset($markup)) {
348
            return strpos($this->contentType, 'xhtml') !== false;
349
        }
350
        // XXX ok ?
351
        return strpos($markup, "<!DOCTYPE html") !== false;
352
        //		return stripos($doctype, 'xhtml') !== false;
0 ignored issues
show
Unused Code Comprehensibility introduced by
57% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
353
        //		$doctype = isset($dom->doctype) && is_object($dom->doctype)
0 ignored issues
show
Unused Code Comprehensibility introduced by
56% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
354
        //			? $dom->doctype->publicId
0 ignored issues
show
Unused Code Comprehensibility introduced by
50% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
355
        //			: self::$defaultDoctype;
0 ignored issues
show
Unused Code Comprehensibility introduced by
72% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
356
    }
357
358
    protected function isXML($markup)
359
    {
360
        //		return strpos($markup, '<?xml') !== false && stripos($markup, 'xhtml') === false;
0 ignored issues
show
Unused Code Comprehensibility introduced by
54% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
361
        return strpos(substr($markup, 0, 100), '<' . '?xml') !== false;
362
    }
363
364
    protected function contentTypeToArray($contentType)
365
    {
366
        $test = null;
0 ignored issues
show
Unused Code introduced by
$test is not used, you could remove the assignment.

This check looks for variable assignements that are either overwritten by other assignments or where the variable is not used subsequently.

$myVar = 'Value';
$higher = false;

if (rand(1, 6) > 3) {
    $higher = true;
} else {
    $higher = false;
}

Both the $myVar assignment in line 1 and the $higher assignment in line 2 are dead. The first because $myVar is never used and the second because $higher is always overwritten for every possible time line.

Loading history...
367
        $test = $matches = explode(';', trim(strtolower($contentType)));
0 ignored issues
show
Unused Code introduced by
$test is not used, you could remove the assignment.

This check looks for variable assignements that are either overwritten by other assignments or where the variable is not used subsequently.

$myVar = 'Value';
$higher = false;

if (rand(1, 6) > 3) {
    $higher = true;
} else {
    $higher = false;
}

Both the $myVar assignment in line 1 and the $higher assignment in line 2 are dead. The first because $myVar is never used and the second because $higher is always overwritten for every possible time line.

Loading history...
368
        if (isset($matches[1])) {
369
            $matches[1] = explode('=', $matches[1]);
370
            // strip 'charset='
371
            $matches[1] = isset($matches[1][1]) && trim($matches[1][1]) ? $matches[1][1]
372
                : $matches[1][0];
373
        } else {
374
            $matches[1] = null;
375
        }
376
        return $matches;
377
    }
378
379
    /**
380
     *
381
     * @param $markup
382
     * @return array contentType, charset
383
     */
384
    protected function contentTypeFromHTML($markup)
385
    {
386
        $matches = array();
387
        // find meta tag
388
        preg_match('@<meta[^>]+http-equiv\\s*=\\s*(["|\'])Content-Type\\1([^>]+?)>@i', $markup, $matches);
389
        if (!isset($matches[0])) {
390
            return array(
391
                null,
392
                null
393
            );
394
        }
395
        // get attr 'content'
396
        preg_match('@content\\s*=\\s*(["|\'])(.+?)\\1@', $matches[0], $matches);
397
        if (!isset($matches[0])) {
398
            return array(
399
                null,
400
                null
401
            );
402
        }
403
        return $this->contentTypeToArray($matches[2]);
404
    }
405
406
    protected function charsetFromHTML($markup)
407
    {
408
        $contentType = $this->contentTypeFromHTML($markup);
409
        return $contentType[1];
410
    }
411
412
    protected function charsetFromXML($markup)
413
    {
414
        $matches;
0 ignored issues
show
Bug introduced by
The variable $matches seems only to be defined at a later point. Did you maybe move this code here without moving the variable definition?

This error can happen if you refactor code and forget to move the variable initialization.

Let’s take a look at a simple example:

function someFunction() {
    $x = 5;
    echo $x;
}

The above code is perfectly fine. Now imagine that we re-order the statements:

function someFunction() {
    echo $x;
    $x = 5;
}

In that case, $x would be read before it is initialized. This was a very basic example, however the principle is the same for the found issue.

Loading history...
415
        // find declaration
416
        preg_match('@<' . '?xml[^>]+encoding\\s*=\\s*(["|\'])(.*?)\\1@i', $markup, $matches);
417
        return isset($matches[2]) ? strtolower($matches[2]) : null;
418
    }
419
420
    /**
421
     * Repositions meta[type=charset] at the start of head. Bypasses \DOMDocument bug.
422
     *
423
     * @link http://code.google.com/p/phpquery/issues/detail?id=80
424
     * @param $html
425
     */
426
    protected function charsetFixHTML($markup)
427
    {
428
        $matches = array();
429
        // find meta tag
430
        preg_match(
431
            '@\s*<meta[^>]+http-equiv\\s*=\\s*(["|\'])Content-Type\\1([^>]+?)>@i',
432
            $markup,
433
            $matches,
434
            PREG_OFFSET_CAPTURE
435
        );
436
        if (!isset($matches[0])) {
437
            return;
438
        }
439
        $metaContentType = $matches[0][0];
440
        $markup          = substr($markup, 0, $matches[0][1])
441
            . substr($markup, $matches[0][1] + strlen($metaContentType));
442
        $headStart       = stripos($markup, '<head>');
443
        $markup          = substr($markup, 0, $headStart + 6) . $metaContentType
444
            . substr($markup, $headStart + 6);
445
        return $markup;
446
    }
447
448
    protected function charsetAppendToHTML($html, $charset, $xhtml = false)
449
    {
450
        // remove existing meta[type=content-type]
451
        $html = preg_replace('@\s*<meta[^>]+http-equiv\\s*=\\s*(["|\'])Content-Type\\1([^>]+?)>@i', '', $html);
452
        $meta = '<meta http-equiv="Content-Type" content="text/html;charset='
453
            . $charset . '" ' . ($xhtml ? '/' : '') . '>';
454
        if (strpos($html, '<head') === false) {
455
            if (strpos($html, '<html') === false) {
456
                return $meta . $html;
457
            } else {
458
                return preg_replace('@<html(.*?)(?(?<!\?)>)@s', "<html\\1><head>{$meta}</head>", $html);
459
            }
460
        } else {
461
            return preg_replace('@<head(.*?)(?(?<!\?)>)@s', '<head\\1>' . $meta, $html);
462
        }
463
    }
464
465
    protected function charsetAppendToXML($markup, $charset)
466
    {
467
        $declaration = '<' . '?xml version="1.0" encoding="' . $charset . '"?'
468
            . '>';
469
        return $declaration . $markup;
470
    }
471
472
    public static function isDocumentFragmentHTML($markup)
473
    {
474
        return stripos($markup, '<html') === false
475
        && stripos($markup, '<!doctype') === false;
476
    }
477
478
    public static function isDocumentFragmentXML($markup)
479
    {
480
        return stripos($markup, '<' . '?xml') === false;
481
    }
482
483
    public static function isDocumentFragmentXHTML($markup)
484
    {
485
        return self::isDocumentFragmentHTML($markup);
486
    }
487
488
    public function importAttr($value)
0 ignored issues
show
Unused Code introduced by
The parameter $value is not used and could be removed.

This check looks from parameters that have been defined for a function or method, but which are not used in the method body.

Loading history...
489
    {
490
        // TODO
491
    }
492
493
    /**
494
     *
495
     * @param $source
496
     * @param $target
497
     * @param $sourceCharset
498
     * @return array Array of imported nodes.
499
     */
500
    public function import($source, $sourceCharset = null)
501
    {
502
        // TODO charset conversions
503
        $return = array();
504
        if ($source instanceof \DOMNode && !($source instanceof \DOMNodeList)) {
505
            $source = array(
506
                $source
507
            );
508
        }
509
        //		if (is_array($source)) {
0 ignored issues
show
Unused Code Comprehensibility introduced by
64% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
510
        //			foreach($source as $node) {
0 ignored issues
show
Unused Code Comprehensibility introduced by
64% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
511
        //				if (is_string($node)) {
0 ignored issues
show
Unused Code Comprehensibility introduced by
64% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
512
        //					// string markup
513
        //					$fake = $this->documentFragmentCreate($node, $sourceCharset);
0 ignored issues
show
Unused Code Comprehensibility introduced by
60% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
514
        //					if ($fake === false)
0 ignored issues
show
Unused Code Comprehensibility introduced by
50% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
515
        //						throw new \Exception("Error loading documentFragment markup");
0 ignored issues
show
Unused Code Comprehensibility introduced by
60% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
516
        //					else
517
        //						$return = array_merge($return,
0 ignored issues
show
Unused Code Comprehensibility introduced by
45% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
518
        //							$this->import($fake->root->childNodes)
0 ignored issues
show
Unused Code Comprehensibility introduced by
64% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
519
        //						);
520
        //				} else {
0 ignored issues
show
Unused Code Comprehensibility introduced by
50% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
521
        //					$return[] = $this->document->importNode($node, true);
0 ignored issues
show
Unused Code Comprehensibility introduced by
64% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
522
        //				}
523
        //			}
524
        //			return $return;
525
        //		} else {
0 ignored issues
show
Unused Code Comprehensibility introduced by
50% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
526
        //			// string markup
527
        //			$fake = $this->documentFragmentCreate($source, $sourceCharset);
0 ignored issues
show
Unused Code Comprehensibility introduced by
60% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
528
        //			if ($fake === false)
0 ignored issues
show
Unused Code Comprehensibility introduced by
50% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
529
        //				throw new \Exception("Error loading documentFragment markup");
0 ignored issues
show
Unused Code Comprehensibility introduced by
60% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
530
        //			else
531
        //				return $this->import($fake->root->childNodes);
0 ignored issues
show
Unused Code Comprehensibility introduced by
65% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
532
        //		}
533
        if (is_array($source) || $source instanceof \DOMNodeList) {
534
            // dom nodes
535
            self::debug('Importing nodes to document');
536
            foreach ($source as $node) {
537
                $return[] = $this->document->importNode($node, true);
538
            }
539
        } else {
540
            // string markup
541
            $fake = $this->documentFragmentCreate($source, $sourceCharset);
542
            if ($fake === false) {
543
                throw new \Exception("Error loading documentFragment markup");
544
            } else {
545
                return $this->import($fake->root->childNodes);
546
            }
547
        }
548
        return $return;
549
    }
550
551
    /**
552
     * Creates new document fragment.
553
     *
554
     * @param $source
555
     * @return DOMDocumentWrapper
556
     */
557
    protected function documentFragmentCreate($source, $charset = null)
558
    {
559
        $fake              = new DOMDocumentWrapper();
560
        $fake->contentType = $this->contentType;
561
        $fake->isXML       = $this->isXML;
562
        $fake->isHTML      = $this->isHTML;
563
        $fake->isXHTML     = $this->isXHTML;
564
        $fake->root        = $fake->document;
565
        if (!$charset) {
566
            $charset = $this->charset;
567
        }
568
        //	$fake->documentCreate($this->charset);
0 ignored issues
show
Unused Code Comprehensibility introduced by
70% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
569
        if ($source instanceof \DOMNode && !($source instanceof \DOMNodeList)) {
570
            $source = array(
571
                $source
572
            );
573
        }
574
        if (is_array($source) || $source instanceof \DOMNodeList) {
575
            // dom nodes
576
            // load fake document
577
            if (!$this->documentFragmentLoadMarkup($fake, $charset)) {
578
                return false;
0 ignored issues
show
Bug Best Practice introduced by
The return type of return false; (false) is incompatible with the return type documented by PhpQuery\Dom\DOMDocument...:documentFragmentCreate of type PhpQuery\Dom\DOMDocumentWrapper.

If you return a value from a function or method, it should be a sub-type of the type that is given by the parent type f.e. an interface, or abstract method. This is more formally defined by the Lizkov substitution principle, and guarantees that classes that depend on the parent type can use any instance of a child type interchangably. This principle also belongs to the SOLID principles for object oriented design.

Let’s take a look at an example:

class Author {
    private $name;

    public function __construct($name) {
        $this->name = $name;
    }

    public function getName() {
        return $this->name;
    }
}

abstract class Post {
    public function getAuthor() {
        return 'Johannes';
    }
}

class BlogPost extends Post {
    public function getAuthor() {
        return new Author('Johannes');
    }
}

class ForumPost extends Post { /* ... */ }

function my_function(Post $post) {
    echo strtoupper($post->getAuthor());
}

Our function my_function expects a Post object, and outputs the author of the post. The base class Post returns a simple string and outputting a simple string will work just fine. However, the child class BlogPost which is a sub-type of Post instead decided to return an object, and is therefore violating the SOLID principles. If a BlogPost were passed to my_function, PHP would not complain, but ultimately fail when executing the strtoupper call in its body.

Loading history...
579
            }
580
            $nodes = $fake->import($source);
581
            foreach ($nodes as $node) {
582
                $fake->root->appendChild($node);
583
            }
584
        } else {
585
            // string markup
586
            $this->documentFragmentLoadMarkup($fake, $charset, $source);
587
        }
588
        return $fake;
589
    }
590
591
    /**
592
     *
593
     * @param $document DOMDocumentWrapper
594
     * @param $markup
595
     * @return $document
0 ignored issues
show
Documentation introduced by
The doc-type $document could not be parsed: Unknown type name "$document" at position 0. (view supported doc-types)

This check marks PHPDoc comments that could not be parsed by our parser. To see which comment annotations we can parse, please refer to our documentation on supported doc-types.

Loading history...
596
     */
597
    private function documentFragmentLoadMarkup($fragment, $charset, $markup = null)
598
    {
599
        // TODO error handling
600
        // TODO copy doctype
601
        // tempolary turn off
602
        $fragment->isDocumentFragment = false;
603
        if ($fragment->isXML) {
604
            if ($fragment->isXHTML) {
605
                // add FAKE element to set default namespace
606
                $fragment->loadMarkupXML(
607
                    '<?xml version="1.0" encoding="' . $charset
608
                    . '"?>'
609
                    . '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" '
610
                    . '"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">'
611
                    . '<fake xmlns="http://www.w3.org/1999/xhtml">' . $markup . '</fake>'
612
                );
613
                $fragment->root = $fragment->document->firstChild->nextSibling;
614
            } else {
615
                $fragment->loadMarkupXML(
616
                    '<?xml version="1.0" encoding="' . $charset
617
                    . '"?><fake>' . $markup . '</fake>'
618
                );
619
                $fragment->root = $fragment->document->firstChild;
620
            }
621
        } else {
622
            $markup2 = PhpQuery::$defaultDoctype
623
                . '<html><head><meta http-equiv="Content-Type" content="text/html;charset='
624
                . $charset . '"></head>';
625
            $noBody  = strpos($markup, '<body') === false;
626
            if ($noBody) {
627
                $markup2 .= '<body>';
628
            }
629
            $markup2 .= $markup;
630
            if ($noBody) {
631
                $markup2 .= '</body>';
632
            }
633
            $markup2 .= '</html>';
634
            $fragment->loadMarkupHTML($markup2);
635
            // TODO resolv body tag merging issue
636
            $fragment->root = $noBody ? $fragment->document->firstChild->nextSibling->firstChild->nextSibling
637
                : $fragment->document->firstChild->nextSibling->firstChild->nextSibling;
638
        }
639
        if (!$fragment->root) {
640
            return false;
641
        }
642
        $fragment->isDocumentFragment = true;
643
        return true;
644
    }
645
646
    protected function documentFragmentToMarkup($fragment)
647
    {
648
        PhpQuery::debug('documentFragmentToMarkup');
649
        $tmp                          = $fragment->isDocumentFragment;
650
        $fragment->isDocumentFragment = false;
651
        $markup                       = $fragment->markup();
652
        if ($fragment->isXML) {
653
            $markup = substr($markup, 0, strrpos($markup, '</fake>'));
654 View Code Duplication
            if ($fragment->isXHTML) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
655
                $markup = substr($markup, strpos($markup, '<fake') + 43);
656
            } else {
657
                $markup = substr($markup, strpos($markup, '<fake>') + 6);
658
            }
659 View Code Duplication
        } else {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
660
            $markup = substr($markup, strpos($markup, '<body>') + 6);
661
            $markup = substr($markup, 0, strrpos($markup, '</body>'));
662
        }
663
        $fragment->isDocumentFragment = $tmp;
664
        if (PhpQuery::$debug) {
665
            PhpQuery::debug('documentFragmentToMarkup: ' . substr($markup, 0, 150));
666
        }
667
        return $markup;
668
    }
669
670
    /**
671
     * Return document markup, starting with optional $nodes as root.
672
     *
673
     * @param $nodes    \DOMNode|\DOMNodeList
674
     * @return string
675
     */
676
    public function markup($nodes = null, $innerMarkup = false)
677
    {
678
        if (isset($nodes) && count($nodes) == 1
679
            && $nodes[0] instanceof \DOMDocument
680
        ) {
681
            $nodes = null;
682
        }
683
        if (isset($nodes)) {
684
            $markup = '';
685
            if (!is_array($nodes) && !($nodes instanceof \DOMNodeList)) {
686
                $nodes = array(
687
                    $nodes
688
                );
689
            }
690
            if ($this->isDocumentFragment && !$innerMarkup) {
691
                foreach ($nodes as $i => $node) {
692
                    if ($node->isSameNode($this->root)) {
693
                        //	var_dump($node);
0 ignored issues
show
Unused Code Comprehensibility introduced by
67% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
694
                        $nodes = array_slice($nodes, 0, $i)
695
                            + PhpQuery::DOMNodeListToArray($node->childNodes)
696
                            + array_slice($nodes, $i + 1);
697
                    }
698
                }
699
            }
700
            if ($this->isXML && !$innerMarkup) {
701
                self::debug("Getting outerXML with charset '{$this->charset}'");
702
                // we need outerXML, so we can benefit from
703
                // $node param support in saveXML()
704
                foreach ($nodes as $node) {
705
                    $markup .= $this->document->saveXML($node);
706
                }
707
            } else {
708
                $loop = array();
709
                if ($innerMarkup) {
710
                    foreach ($nodes as $node) {
711
                        if ($node->childNodes) {
712
                            foreach ($node->childNodes as $child) {
713
                                $loop[] = $child;
714
                            }
715
                        } else {
716
                            $loop[] = $node;
717
                        }
718
                    }
719
                } else {
720
                    $loop = $nodes;
721
                }
722
                self::debug(
723
                    "Getting markup, moving selected nodes (" . count($loop)
724
                    . ") to new DocumentFragment"
725
                );
726
                $fake   = $this->documentFragmentCreate($loop);
727
                $markup = $this->documentFragmentToMarkup($fake);
728
            }
729
            if ($this->isXHTML) {
730
                self::debug("Fixing XHTML");
731
                $markup = self::markupFixXHTML($markup);
732
            }
733
            self::debug("Markup: " . substr($markup, 0, 250));
734
            return $markup;
735
        } else {
736
            if ($this->isDocumentFragment) {
737
                // documentFragment, html only...
738
                self::debug("Getting markup, DocumentFragment detected");
739
                //				return $this->markup(
0 ignored issues
show
Unused Code Comprehensibility introduced by
58% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
740
                ////					$this->document->getElementsByTagName('body')->item(0)
741
                //					$this->document->root, true
0 ignored issues
show
Unused Code Comprehensibility introduced by
56% of this comment could be valid code. Did you maybe forget this after debugging?

Sometimes obsolete code just ends up commented out instead of removed. In this case it is better to remove the code once you have checked you do not need it.

The code might also have been commented out for debugging purposes. In this case it is vital that someone uncomments it again or your project may behave in very unexpected ways in production.

This check looks for comments that seem to be mostly valid code and reports them.

Loading history...
742
                //				);
743
                $markup = $this->documentFragmentToMarkup($this);
744
                // no need for markupFixXHTML, as it's done thought markup($nodes) method
745
                return $markup;
746
            } else {
747
                self::debug(
748
                    "Getting markup (" . ($this->isXML ? 'XML' : 'HTML')
749
                    . "), final with charset '{$this->charset}'"
750
                );
751
                $markup = $this->isXML ? $this->document->saveXML()
752
                    : $this->document->saveHTML();
753
                if ($this->isXHTML) {
754
                    self::debug("Fixing XHTML");
755
                    $markup = self::markupFixXHTML($markup);
756
                }
757
                self::debug("Markup: " . substr($markup, 0, 250));
758
                return $markup;
759
            }
760
        }
761
    }
762
763
    protected static function markupFixXHTML($markup)
764
    {
765
        $markup = self::expandEmptyTag('script', $markup);
766
        $markup = self::expandEmptyTag('select', $markup);
767
        $markup = self::expandEmptyTag('textarea', $markup);
768
        return $markup;
769
    }
770
771
    public static function debug($text)
772
    {
773
        PhpQuery::debug($text);
774
    }
775
776
    /**
777
     * expandEmptyTag
778
     *
779
     * @param $tag
780
     * @param $xml
781
     * @return string
782
     * @author mjaque at ilkebenson dot com
783
     * @link   http://php.net/manual/en/domdocument.savehtml.php#81256
784
     */
785
    public static function expandEmptyTag($tag, $xml)
786
    {
787
        $indice = 0;
788
        while ($indice < strlen($xml)) {
789
            $pos = strpos($xml, "<$tag ", $indice);
790
            if ($pos) {
791
                $posCierre = strpos($xml, ">", $pos);
792
                if ($xml[$posCierre - 1] == "/") {
793
                    $xml = substr_replace($xml, "></$tag>", $posCierre - 1, 2);
794
                }
795
                $indice = $posCierre;
796
            } else {
797
                break;
798
            }
799
        }
800
        return $xml;
801
    }
802
}
803