Completed
Push — master ( 5eb7b7...2a7445 )
by Lars
02:01 queued 11s
created

XmlDomParser::loadXml()   A

Complexity

Conditions 1
Paths 1

Size

Total Lines 6

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 3
CRAP Score 1

Importance

Changes 0
Metric Value
dl 0
loc 6
ccs 3
cts 3
cp 1
rs 10
c 0
b 0
f 0
cc 1
nc 1
nop 2
crap 1
1
<?php
2
3
declare(strict_types=1);
4
5
namespace voku\helper;
6
7
/**
8
 * @property-read string $plaintext
9
 *                                 <p>Get dom node's plain text.</p>
10
 *
11
 * @method static XmlDomParser file_get_xml($xml, $libXMLExtraOptions = null)
12
 *                                 <p>Load XML from file.</p>
13
 * @method static XmlDomParser str_get_xml($xml, $libXMLExtraOptions = null)
14
 *                                 <p>Load XML from string.</p>
15
 */
16
class XmlDomParser extends AbstractDomParser
17
{
18
    /**
19
     * @param \DOMNode|SimpleXmlDomInterface|string $element HTML code or SimpleXmlDomInterface, \DOMNode
20
     */
21 3 View Code Duplication
    public function __construct($element = null)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
22
    {
23 3
        $this->document = new \DOMDocument('1.0', $this->getEncoding());
24
25
        // DOMDocument settings
26 3
        $this->document->preserveWhiteSpace = true;
27 3
        $this->document->formatOutput = true;
28
29 3
        if ($element instanceof SimpleXmlDomInterface) {
30
            $element = $element->getNode();
31
        }
32
33 3
        if ($element instanceof \DOMNode) {
34
            $domNode = $this->document->importNode($element, true);
35
36
            if ($domNode instanceof \DOMNode) {
37
                /** @noinspection UnusedFunctionResultInspection */
38
                $this->document->appendChild($domNode);
39
            }
40
41
            return;
42
        }
43
44 3
        if ($element !== null) {
45
            /** @noinspection UnusedFunctionResultInspection */
46
            $this->loadXml($element);
47
        }
48 3
    }
49
50
    /**
51
     * @param string $name
52
     * @param array  $arguments
53
     *
54
     * @throws \BadMethodCallException
55
     * @throws \RuntimeException
56
     *
57
     * @return XmlDomParser
58
     */
59 3 View Code Duplication
    public static function __callStatic($name, $arguments)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
60
    {
61 3
        $arguments0 = $arguments[0] ?? '';
62
63 3
        $arguments1 = $arguments[1] ?? null;
64
65 3
        if ($name === 'str_get_xml') {
66 1
            $parser = new static();
67
68 1
            return $parser->loadXml($arguments0, $arguments1);
69
        }
70
71 2
        if ($name === 'file_get_xml') {
72 2
            $parser = new static();
73
74 2
            return $parser->loadXmlFile($arguments0, $arguments1);
0 ignored issues
show
Bug Best Practice introduced by
The return type of return $parser->loadXmlF...guments0, $arguments1); (self) is incompatible with the return type declared by the abstract method voku\helper\AbstractDomParser::__callStatic of type voku\helper\AbstractDomParser.

If you return a value from a function or method, it should be a sub-type of the type that is given by the parent type f.e. an interface, or abstract method. This is more formally defined by the Lizkov substitution principle, and guarantees that classes that depend on the parent type can use any instance of a child type interchangably. This principle also belongs to the SOLID principles for object oriented design.

Let’s take a look at an example:

class Author {
    private $name;

    public function __construct($name) {
        $this->name = $name;
    }

    public function getName() {
        return $this->name;
    }
}

abstract class Post {
    public function getAuthor() {
        return 'Johannes';
    }
}

class BlogPost extends Post {
    public function getAuthor() {
        return new Author('Johannes');
    }
}

class ForumPost extends Post { /* ... */ }

function my_function(Post $post) {
    echo strtoupper($post->getAuthor());
}

Our function my_function expects a Post object, and outputs the author of the post. The base class Post returns a simple string and outputting a simple string will work just fine. However, the child class BlogPost which is a sub-type of Post instead decided to return an object, and is therefore violating the SOLID principles. If a BlogPost were passed to my_function, PHP would not complain, but ultimately fail when executing the strtoupper call in its body.

Loading history...
75
        }
76
77
        throw new \BadMethodCallException('Method does not exist');
78
    }
79
80
    /** @noinspection MagicMethodsValidityInspection */
81
82
    /**
83
     * @param string $name
84
     *
85
     * @return string|null
86
     */
87
    public function __get($name)
88
    {
89
        $name = \strtolower($name);
90
91
        if ($name === 'plaintext') {
92
            return $this->text();
93
        }
94
95
        return null;
96
    }
97
98
    /**
99
     * @return string
100
     */
101 2
    public function __toString()
102
    {
103 2
        return $this->xml(false, false, true, 0);
104
    }
105
106
    /**
107
     * Create DOMDocument from XML.
108
     *
109
     * @param string   $xml
110
     * @param int|null $libXMLExtraOptions
111
     *
112
     * @return \DOMDocument
113
     */
114 3
    protected function createDOMDocument(string $xml, $libXMLExtraOptions = null): \DOMDocument
115
    {
116
        // set error level
117 3
        $internalErrors = \libxml_use_internal_errors(true);
118 3
        $disableEntityLoader = \libxml_disable_entity_loader(true);
119 3
        \libxml_clear_errors();
120
121 3
        $optionsXml = \LIBXML_DTDLOAD | \LIBXML_DTDATTR | \LIBXML_NONET;
122
123 3
        if (\defined('LIBXML_BIGLINES')) {
124 3
            $optionsXml |= \LIBXML_BIGLINES;
125
        }
126
127 3
        if (\defined('LIBXML_COMPACT')) {
128 3
            $optionsXml |= \LIBXML_COMPACT;
129
        }
130
131 3
        if ($libXMLExtraOptions !== null) {
132
            $optionsXml |= $libXMLExtraOptions;
133
        }
134
135 3
        $xml = self::replaceToPreserveHtmlEntities($xml);
136
137 3
        $documentFound = false;
138 3
        $sxe = \simplexml_load_string($xml, \SimpleXMLElement::class, $optionsXml);
139 3 View Code Duplication
        if ($sxe !== false && \count(\libxml_get_errors()) === 0) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
140 3
            $domElementTmp = \dom_import_simplexml($sxe);
141 3
            if ($domElementTmp) {
142 3
                $documentFound = true;
143 3
                $this->document = $domElementTmp->ownerDocument;
144
            }
145
        }
146
147 3 View Code Duplication
        if ($documentFound === false) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
148
149
            // UTF-8 hack: http://php.net/manual/en/domdocument.loadhtml.php#95251
150
            $xmlHackUsed = false;
151
            /** @noinspection StringFragmentMisplacedInspection */
152
            if (\stripos('<?xml', $xml) !== 0) {
153
                $xmlHackUsed = true;
154
                $xml = '<?xml encoding="' . $this->getEncoding() . '" ?>' . $xml;
155
            }
156
157
            $this->document->loadXML($xml, $optionsXml);
158
159
            // remove the "xml-encoding" hack
160
            if ($xmlHackUsed) {
161
                foreach ($this->document->childNodes as $child) {
162
                    if ($child->nodeType === \XML_PI_NODE) {
163
                        /** @noinspection UnusedFunctionResultInspection */
164
                        $this->document->removeChild($child);
165
166
                        break;
167
                    }
168
                }
169
            }
170
        }
171
172
        // set encoding
173 3
        $this->document->encoding = $this->getEncoding();
174
175
        // restore lib-xml settings
176 3
        \libxml_clear_errors();
177 3
        \libxml_use_internal_errors($internalErrors);
178 3
        \libxml_disable_entity_loader($disableEntityLoader);
179
180 3
        return $this->document;
181
    }
182
183
    /**
184
     * Find list of nodes with a CSS selector.
185
     *
186
     * @param string   $selector
187
     * @param int|null $idx
188
     *
189
     * @return SimpleXmlDomInterface|SimpleXmlDomInterface[]|SimpleXmlDomNodeInterface
190
     */
191 1 View Code Duplication
    public function find(string $selector, $idx = null)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
192
    {
193 1
        $xPathQuery = SelectorConverter::toXPath($selector);
194
195 1
        $xPath = new \DOMXPath($this->document);
196 1
        $nodesList = $xPath->query($xPathQuery);
197 1
        $elements = new SimpleXmlDomNode();
198
199 1
        foreach ($nodesList as $node) {
200 1
            $elements[] = new SimpleXmlDom($node);
201
        }
202
203
        // return all elements
204 1
        if ($idx === null) {
205 1
            if (\count($elements) === 0) {
206 1
                return new SimpleXmlDomNodeBlank();
207
            }
208
209 1
            return $elements;
210
        }
211
212
        // handle negative values
213 1
        if ($idx < 0) {
214
            $idx = \count($elements) + $idx;
215
        }
216
217
        // return one element
218 1
        return $elements[$idx] ?? new SimpleXmlDomBlank();
219
    }
220
221
    /**
222
     * Find nodes with a CSS selector.
223
     *
224
     * @param string $selector
225
     *
226
     * @return SimpleXmlDomInterface[]|SimpleXmlDomNodeInterface
227
     */
228 1
    public function findMulti(string $selector): SimpleXmlDomNodeInterface
229
    {
230 1
        return $this->find($selector, null);
231
    }
232
233
    /**
234
     * Find nodes with a CSS selector or false, if no element is found.
235
     *
236
     * @param string $selector
237
     *
238
     * @return false|SimpleXmlDomInterface[]|SimpleXmlDomNodeInterface
239
     */
240 1
    public function findMultiOrFalse(string $selector)
241
    {
242 1
        $return = $this->find($selector, null);
243
244 1
        if ($return instanceof SimpleXmlDomNodeBlank) {
245 1
            return false;
246
        }
247
248
        return $return;
249
    }
250
251
    /**
252
     * Find one node with a CSS selector.
253
     *
254
     * @param string $selector
255
     *
256
     * @return SimpleXmlDomInterface
257
     */
258 1
    public function findOne(string $selector): SimpleXmlDomInterface
259
    {
260 1
        return $this->find($selector, 0);
261
    }
262
263
    /**
264
     * Find one node with a CSS selector or false, if no element is found.
265
     *
266
     * @param string $selector
267
     *
268
     * @return false|SimpleXmlDomInterface
269
     */
270 1
    public function findOneOrFalse(string $selector)
271
    {
272 1
        $return = $this->find($selector, 0);
273
274 1
        if ($return instanceof SimpleXmlDomBlank) {
275 1
            return false;
276
        }
277
278
        return $return;
279
    }
280
281
    /**
282
     * @param string $content
283
     * @param bool   $multiDecodeNewHtmlEntity
284
     *
285
     * @return string
286
     */
287
    public function fixHtmlOutput(string $content, bool $multiDecodeNewHtmlEntity = false): string
288
    {
289
        $content = $this->decodeHtmlEntity($content, $multiDecodeNewHtmlEntity);
290
291
        return self::putReplacedBackToPreserveHtmlEntities($content);
292
    }
293
294
    /**
295
     * Return elements by ".class".
296
     *
297
     * @param string $class
298
     *
299
     * @return SimpleXmlDomInterface[]|SimpleXmlDomNodeInterface
300
     */
301
    public function getElementByClass(string $class): SimpleXmlDomNodeInterface
302
    {
303
        return $this->findMulti(".${class}");
304
    }
305
306
    /**
307
     * Return element by #id.
308
     *
309
     * @param string $id
310
     *
311
     * @return SimpleXmlDomInterface
312
     */
313
    public function getElementById(string $id): SimpleXmlDomInterface
314
    {
315
        return $this->findOne("#${id}");
316
    }
317
318
    /**
319
     * Return element by tag name.
320
     *
321
     * @param string $name
322
     *
323
     * @return SimpleXmlDomInterface
324
     */
325
    public function getElementByTagName(string $name): SimpleXmlDomInterface
326
    {
327
        $node = $this->document->getElementsByTagName($name)->item(0);
328
329
        if ($node === null) {
330
            return new SimpleXmlDomBlank();
331
        }
332
333
        return new SimpleXmlDom($node);
334
    }
335
336
    /**
337
     * Returns elements by "#id".
338
     *
339
     * @param string   $id
340
     * @param int|null $idx
341
     *
342
     * @return SimpleXmlDomInterface|SimpleXmlDomInterface[]|SimpleXmlDomNodeInterface
343
     */
344
    public function getElementsById(string $id, $idx = null)
345
    {
346
        return $this->find("#${id}", $idx);
347
    }
348
349
    /**
350
     * Returns elements by tag name.
351
     *
352
     * @param string   $name
353
     * @param int|null $idx
354
     *
355
     * @return SimpleXmlDomInterface|SimpleXmlDomInterface[]|SimpleXmlDomNodeInterface
356
     */
357 View Code Duplication
    public function getElementsByTagName(string $name, $idx = null)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
358
    {
359
        $nodesList = $this->document->getElementsByTagName($name);
360
361
        $elements = new SimpleXmlDomNode();
362
363
        foreach ($nodesList as $node) {
364
            $elements[] = new SimpleXmlDom($node);
365
        }
366
367
        // return all elements
368
        if ($idx === null) {
369
            if (\count($elements) === 0) {
370
                return new SimpleXmlDomNodeBlank();
371
            }
372
373
            return $elements;
374
        }
375
376
        // handle negative values
377
        if ($idx < 0) {
378
            $idx = \count($elements) + $idx;
379
        }
380
381
        // return one element
382
        return $elements[$idx] ?? new SimpleXmlDomNodeBlank();
383
    }
384
385
    /**
386
     * Get dom node's outer html.
387
     *
388
     * @param bool $multiDecodeNewHtmlEntity
389
     *
390
     * @return string
391
     */
392
    public function html(bool $multiDecodeNewHtmlEntity = false): string
393
    {
394
        if (static::$callback !== null) {
395
            \call_user_func(static::$callback, [$this]);
396
        }
397
398
        $content = $this->document->saveHTML();
399
400
        if ($content === false) {
401
            return '';
402
        }
403
404
        return $this->fixHtmlOutput($content, $multiDecodeNewHtmlEntity);
405
    }
406
407
    /**
408
     * Load HTML from string.
409
     *
410
     * @param string   $html
411
     * @param int|null $libXMLExtraOptions
412
     *
413
     * @return self
414
     */
415
    public function loadHtml(string $html, $libXMLExtraOptions = null): DomParserInterface
416
    {
417
        $this->document = $this->createDOMDocument($html, $libXMLExtraOptions);
418
419
        return $this;
0 ignored issues
show
Bug Best Practice introduced by
The return type of return $this; (voku\helper\XmlDomParser) is incompatible with the return type declared by the interface voku\helper\DomParserInterface::loadHtml of type self.

If you return a value from a function or method, it should be a sub-type of the type that is given by the parent type f.e. an interface, or abstract method. This is more formally defined by the Lizkov substitution principle, and guarantees that classes that depend on the parent type can use any instance of a child type interchangably. This principle also belongs to the SOLID principles for object oriented design.

Let’s take a look at an example:

class Author {
    private $name;

    public function __construct($name) {
        $this->name = $name;
    }

    public function getName() {
        return $this->name;
    }
}

abstract class Post {
    public function getAuthor() {
        return 'Johannes';
    }
}

class BlogPost extends Post {
    public function getAuthor() {
        return new Author('Johannes');
    }
}

class ForumPost extends Post { /* ... */ }

function my_function(Post $post) {
    echo strtoupper($post->getAuthor());
}

Our function my_function expects a Post object, and outputs the author of the post. The base class Post returns a simple string and outputting a simple string will work just fine. However, the child class BlogPost which is a sub-type of Post instead decided to return an object, and is therefore violating the SOLID principles. If a BlogPost were passed to my_function, PHP would not complain, but ultimately fail when executing the strtoupper call in its body.

Loading history...
420
    }
421
422
    /**
423
     * Load HTML from file.
424
     *
425
     * @param string   $filePath
426
     * @param int|null $libXMLExtraOptions
427
     *
428
     * @throws \RuntimeException
429
     *
430
     * @return XmlDomParser
431
     */
432 View Code Duplication
    public function loadHtmlFile(string $filePath, $libXMLExtraOptions = null): DomParserInterface
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
433
    {
434
        if (
435
            !\preg_match("/^https?:\/\//i", $filePath)
436
            &&
437
            !\file_exists($filePath)
438
        ) {
439
            throw new \RuntimeException("File ${filePath} not found");
440
        }
441
442
        try {
443
            if (\class_exists('\voku\helper\UTF8')) {
444
                /** @noinspection PhpUndefinedClassInspection */
445
                $html = UTF8::file_get_contents($filePath);
446
            } else {
447
                $html = \file_get_contents($filePath);
448
            }
449
        } catch (\Exception $e) {
450
            throw new \RuntimeException("Could not load file ${filePath}");
451
        }
452
453
        if ($html === false) {
454
            throw new \RuntimeException("Could not load file ${filePath}");
455
        }
456
457
        return $this->loadHtml($html, $libXMLExtraOptions);
0 ignored issues
show
Bug Best Practice introduced by
The return type of return $this->loadHtml($..., $libXMLExtraOptions); (voku\helper\XmlDomParser) is incompatible with the return type declared by the interface voku\helper\DomParserInterface::loadHtmlFile of type self.

If you return a value from a function or method, it should be a sub-type of the type that is given by the parent type f.e. an interface, or abstract method. This is more formally defined by the Lizkov substitution principle, and guarantees that classes that depend on the parent type can use any instance of a child type interchangably. This principle also belongs to the SOLID principles for object oriented design.

Let’s take a look at an example:

class Author {
    private $name;

    public function __construct($name) {
        $this->name = $name;
    }

    public function getName() {
        return $this->name;
    }
}

abstract class Post {
    public function getAuthor() {
        return 'Johannes';
    }
}

class BlogPost extends Post {
    public function getAuthor() {
        return new Author('Johannes');
    }
}

class ForumPost extends Post { /* ... */ }

function my_function(Post $post) {
    echo strtoupper($post->getAuthor());
}

Our function my_function expects a Post object, and outputs the author of the post. The base class Post returns a simple string and outputting a simple string will work just fine. However, the child class BlogPost which is a sub-type of Post instead decided to return an object, and is therefore violating the SOLID principles. If a BlogPost were passed to my_function, PHP would not complain, but ultimately fail when executing the strtoupper call in its body.

Loading history...
458
    }
459
460
    /**
461
     * @param string $selector
462
     * @param int    $idx
463
     *
464
     * @return SimpleXmlDomInterface|SimpleXmlDomInterface[]|SimpleXmlDomNodeInterface
465
     */
466
    public function __invoke($selector, $idx = null)
467
    {
468
        return $this->find($selector, $idx);
469
    }
470
471
    /**
472
     * Load XML from string.
473
     *
474
     * @param string   $xml
475
     * @param int|null $libXMLExtraOptions
476
     *
477
     * @return XmlDomParser
478
     */
479 3
    public function loadXml(string $xml, $libXMLExtraOptions = null): self
480
    {
481 3
        $this->document = $this->createDOMDocument($xml, $libXMLExtraOptions);
482
483 3
        return $this;
484
    }
485
486
    /**
487
     * Load XML from file.
488
     *
489
     * @param string   $filePath
490
     * @param int|null $libXMLExtraOptions
491
     *
492
     * @throws \RuntimeException
493
     *
494
     * @return XmlDomParser
495
     */
496 2 View Code Duplication
    public function loadXmlFile(string $filePath, $libXMLExtraOptions = null): self
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
497
    {
498
        if (
499 2
            !\preg_match("/^https?:\/\//i", $filePath)
500
            &&
501 2
            !\file_exists($filePath)
502
        ) {
503
            throw new \RuntimeException("File ${filePath} not found");
504
        }
505
506
        try {
507 2
            if (\class_exists('\voku\helper\UTF8')) {
508
                /** @noinspection PhpUndefinedClassInspection */
509
                $xml = UTF8::file_get_contents($filePath);
510
            } else {
511 2
                $xml = \file_get_contents($filePath);
512
            }
513
        } catch (\Exception $e) {
514
            throw new \RuntimeException("Could not load file ${filePath}");
515
        }
516
517 2
        if ($xml === false) {
518
            throw new \RuntimeException("Could not load file ${filePath}");
519
        }
520
521 2
        return $this->loadXml($xml, $libXMLExtraOptions);
522
    }
523
524
    /**
525
     * @param callable      $callback
526
     * @param \DOMNode|null $domNode
527
     */
528 1
    public function replaceTextWithCallback($callback, \DOMNode $domNode = null)
529
    {
530 1
        if ($domNode === null) {
531 1
            $domNode = $this->document;
532
        }
533
534 1
        if ($domNode->hasChildNodes()) {
535 1
            $children = [];
536
537
            // since looping through a DOM being modified is a bad idea we prepare an array:
538 1
            foreach ($domNode->childNodes as $child) {
539 1
                $children[] = $child;
540
            }
541
542 1
            foreach ($children as $child) {
543 1
                if ($child->nodeType === \XML_TEXT_NODE) {
544 1
                    $oldText = self::putReplacedBackToPreserveHtmlEntities($child->wholeText);
545 1
                    $newText = $callback($oldText);
546 1
                    if ($domNode->ownerDocument) {
547 1
                        $newTextNode = $domNode->ownerDocument->createTextNode(self::replaceToPreserveHtmlEntities($newText));
548 1
                        $domNode->replaceChild($newTextNode, $child);
549
                    }
550
                } else {
551 1
                    $this->replaceTextWithCallback($callback, $child);
552
                }
553
            }
554
        }
555 1
    }
556
}
557