Completed
Push — master ( fefe4d...d45b97 )
by Lars
01:59
created

XmlDomParser::loadXmlFile()   B

Complexity

Conditions 6
Paths 8

Size

Total Lines 27

Duplication

Lines 27
Ratio 100 %

Code Coverage

Tests 6
CRAP Score 9.3798

Importance

Changes 0
Metric Value
dl 27
loc 27
ccs 6
cts 11
cp 0.5455
rs 8.8657
c 0
b 0
f 0
cc 6
nc 8
nop 2
crap 9.3798
1
<?php
2
3
declare(strict_types=1);
4
5
namespace voku\helper;
6
7
/**
8
 * @property-read string $plaintext
9
 *                                 <p>Get dom node's plain text.</p>
10
 *
11
 * @method static XmlDomParser file_get_xml($xml, $libXMLExtraOptions = null)
12
 *                                 <p>Load XML from file.</p>
13
 * @method static XmlDomParser str_get_xml($xml, $libXMLExtraOptions = null)
14
 *                                 <p>Load XML from string.</p>
15
 */
16
class XmlDomParser extends AbstractDomParser
17
{
18
    /**
19
     * @param \DOMNode|SimpleXmlDomInterface|string $element HTML code or SimpleXmlDomInterface, \DOMNode
20
     */
21 3 View Code Duplication
    public function __construct($element = null)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
22
    {
23 3
        $this->document = new \DOMDocument('1.0', $this->getEncoding());
24
25
        // DOMDocument settings
26 3
        $this->document->preserveWhiteSpace = true;
27 3
        $this->document->formatOutput = true;
28
29 3
        if ($element instanceof SimpleXmlDomInterface) {
30
            $element = $element->getNode();
31
        }
32
33 3
        if ($element instanceof \DOMNode) {
34
            $domNode = $this->document->importNode($element, true);
35
36
            if ($domNode instanceof \DOMNode) {
37
                /** @noinspection UnusedFunctionResultInspection */
38
                $this->document->appendChild($domNode);
39
            }
40
41
            return;
42
        }
43
44 3
        if ($element !== null) {
45
            /** @noinspection UnusedFunctionResultInspection */
46
            $this->loadXml($element);
47
        }
48 3
    }
49
50
    /**
51
     * @param string $name
52
     * @param array  $arguments
53
     *
54
     * @throws \BadMethodCallException
55
     * @throws \RuntimeException
56
     *
57
     * @return XmlDomParser
58
     */
59 3 View Code Duplication
    public static function __callStatic($name, $arguments)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
60
    {
61 3
        $arguments0 = $arguments[0] ?? '';
62
63 3
        $arguments1 = $arguments[1] ?? null;
64
65 3
        if ($name === 'str_get_xml') {
66 1
            $parser = new static();
67
68 1
            return $parser->loadXml($arguments0, $arguments1);
69
        }
70
71 2
        if ($name === 'file_get_xml') {
72 2
            $parser = new static();
73
74 2
            return $parser->loadXmlFile($arguments0, $arguments1);
0 ignored issues
show
Bug Best Practice introduced by
The return type of return $parser->loadXmlF...guments0, $arguments1); (self) is incompatible with the return type declared by the abstract method voku\helper\AbstractDomParser::__callStatic of type voku\helper\AbstractDomParser.

If you return a value from a function or method, it should be a sub-type of the type that is given by the parent type f.e. an interface, or abstract method. This is more formally defined by the Lizkov substitution principle, and guarantees that classes that depend on the parent type can use any instance of a child type interchangably. This principle also belongs to the SOLID principles for object oriented design.

Let’s take a look at an example:

class Author {
    private $name;

    public function __construct($name) {
        $this->name = $name;
    }

    public function getName() {
        return $this->name;
    }
}

abstract class Post {
    public function getAuthor() {
        return 'Johannes';
    }
}

class BlogPost extends Post {
    public function getAuthor() {
        return new Author('Johannes');
    }
}

class ForumPost extends Post { /* ... */ }

function my_function(Post $post) {
    echo strtoupper($post->getAuthor());
}

Our function my_function expects a Post object, and outputs the author of the post. The base class Post returns a simple string and outputting a simple string will work just fine. However, the child class BlogPost which is a sub-type of Post instead decided to return an object, and is therefore violating the SOLID principles. If a BlogPost were passed to my_function, PHP would not complain, but ultimately fail when executing the strtoupper call in its body.

Loading history...
75
        }
76
77
        throw new \BadMethodCallException('Method does not exist');
78
    }
79
80
    /** @noinspection MagicMethodsValidityInspection */
81
82
    /**
83
     * @param string $name
84
     *
85
     * @return string|null
86
     */
87
    public function __get($name)
88
    {
89
        $name = \strtolower($name);
90
91
        if ($name === 'plaintext') {
92
            return $this->text();
93
        }
94
95
        return null;
96
    }
97
98
    /**
99
     * @return string
100
     */
101 2
    public function __toString()
102
    {
103 2
        return $this->xml(false, false, true, 0);
104
    }
105
106
    /**
107
     * Create DOMDocument from XML.
108
     *
109
     * @param string   $xml
110
     * @param int|null $libXMLExtraOptions
111
     *
112
     * @return \DOMDocument
113
     */
114 3
    protected function createDOMDocument(string $xml, $libXMLExtraOptions = null): \DOMDocument
115
    {
116
        // set error level
117 3
        $internalErrors = \libxml_use_internal_errors(true);
118 3
        $disableEntityLoader = \libxml_disable_entity_loader(true);
119 3
        \libxml_clear_errors();
120
121 3
        $optionsXml = \LIBXML_DTDLOAD | \LIBXML_DTDATTR | \LIBXML_NONET;
122
123 3
        if (\defined('LIBXML_BIGLINES')) {
124 3
            $optionsXml |= \LIBXML_BIGLINES;
125
        }
126
127 3
        if (\defined('LIBXML_COMPACT')) {
128 3
            $optionsXml |= \LIBXML_COMPACT;
129
        }
130
131 3
        if ($libXMLExtraOptions !== null) {
132
            $optionsXml |= $libXMLExtraOptions;
133
        }
134
135 3
        $xml = self::replaceToPreserveHtmlEntities($xml);
136
137 3
        $documentFound = false;
138 3
        $sxe = \simplexml_load_string($xml, \SimpleXMLElement::class, $optionsXml);
139 3 View Code Duplication
        if ($sxe !== false && \count(\libxml_get_errors()) === 0) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
140 3
            $domElementTmp = \dom_import_simplexml($sxe);
141 3
            if ($domElementTmp) {
142 3
                $documentFound = true;
143 3
                $this->document = $domElementTmp->ownerDocument;
144
            }
145
        }
146
147 3 View Code Duplication
        if ($documentFound === false) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
148
149
            // UTF-8 hack: http://php.net/manual/en/domdocument.loadhtml.php#95251
150
            $xmlHackUsed = false;
151
            if (\stripos('<?xml', $xml) !== 0) {
152
                $xmlHackUsed = true;
153
                $xml = '<?xml encoding="' . $this->getEncoding() . '" ?>' . $xml;
154
            }
155
156
            $this->document->loadXML($xml, $optionsXml);
157
158
            // remove the "xml-encoding" hack
159
            if ($xmlHackUsed) {
160
                foreach ($this->document->childNodes as $child) {
161
                    if ($child->nodeType === \XML_PI_NODE) {
162
                        /** @noinspection UnusedFunctionResultInspection */
163
                        $this->document->removeChild($child);
164
165
                        break;
166
                    }
167
                }
168
            }
169
        }
170
171
        // set encoding
172 3
        $this->document->encoding = $this->getEncoding();
173
174
        // restore lib-xml settings
175 3
        \libxml_clear_errors();
176 3
        \libxml_use_internal_errors($internalErrors);
177 3
        \libxml_disable_entity_loader($disableEntityLoader);
178
179 3
        return $this->document;
180
    }
181
182
    /**
183
     * Find list of nodes with a CSS selector.
184
     *
185
     * @param string   $selector
186
     * @param int|null $idx
187
     *
188
     * @return SimpleXmlDomInterface|SimpleXmlDomInterface[]|SimpleXmlDomNodeInterface
189
     */
190 1 View Code Duplication
    public function find(string $selector, $idx = null)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
191
    {
192 1
        $xPathQuery = SelectorConverter::toXPath($selector);
193
194 1
        $xPath = new \DOMXPath($this->document);
195 1
        $nodesList = $xPath->query($xPathQuery);
196 1
        $elements = new SimpleXmlDomNode();
197
198 1
        foreach ($nodesList as $node) {
199 1
            $elements[] = new SimpleXmlDom($node);
200
        }
201
202
        // return all elements
203 1
        if ($idx === null) {
204
            if (\count($elements) === 0) {
205
                return new SimpleXmlDomNodeBlank();
206
            }
207
208
            return $elements;
209
        }
210
211
        // handle negative values
212 1
        if ($idx < 0) {
213
            $idx = \count($elements) + $idx;
214
        }
215
216
        // return one element
217 1
        return $elements[$idx] ?? new SimpleXmlDomBlank();
218
    }
219
220
    /**
221
     * Find nodes with a CSS selector.
222
     *
223
     * @param string $selector
224
     *
225
     * @return SimpleXmlDomInterface[]|SimpleXmlDomNodeInterface
226
     */
227
    public function findMulti(string $selector): SimpleXmlDomNodeInterface
228
    {
229
        return $this->find($selector, null);
230
    }
231
232
    /**
233
     * Find one node with a CSS selector.
234
     *
235
     * @param string $selector
236
     *
237
     * @return SimpleXmlDomInterface
238
     */
239 1
    public function findOne(string $selector): SimpleXmlDomInterface
240
    {
241 1
        return $this->find($selector, 0);
242
    }
243
244
    /**
245
     * @param string $content
246
     * @param bool   $multiDecodeNewHtmlEntity
247
     *
248
     * @return string
249
     */
250
    public function fixHtmlOutput(string $content, bool $multiDecodeNewHtmlEntity = false): string
251
    {
252
        $content = $this->decodeHtmlEntity($content, $multiDecodeNewHtmlEntity);
253
254
        return self::putReplacedBackToPreserveHtmlEntities($content);
255
    }
256
257
    /**
258
     * Return elements by .class.
259
     *
260
     * @param string $class
261
     *
262
     * @return SimpleXmlDomInterface[]|SimpleXmlDomNodeInterface
263
     */
264
    public function getElementByClass(string $class): SimpleXmlDomNodeInterface
265
    {
266
        return $this->findMulti(".${class}");
267
    }
268
269
    /**
270
     * Return element by #id.
271
     *
272
     * @param string $id
273
     *
274
     * @return SimpleXmlDomInterface
275
     */
276
    public function getElementById(string $id): SimpleXmlDomInterface
277
    {
278
        return $this->findOne("#${id}");
279
    }
280
281
    /**
282
     * Return element by tag name.
283
     *
284
     * @param string $name
285
     *
286
     * @return SimpleXmlDomInterface
287
     */
288
    public function getElementByTagName(string $name): SimpleXmlDomInterface
289
    {
290
        $node = $this->document->getElementsByTagName($name)->item(0);
291
292
        if ($node === null) {
293
            return new SimpleXmlDomBlank();
294
        }
295
296
        return new SimpleXmlDom($node);
297
    }
298
299
    /**
300
     * Returns elements by #id.
301
     *
302
     * @param string   $id
303
     * @param int|null $idx
304
     *
305
     * @return SimpleXmlDomInterface|SimpleXmlDomInterface[]|SimpleXmlDomNodeInterface
306
     */
307
    public function getElementsById(string $id, $idx = null)
308
    {
309
        return $this->find("#${id}", $idx);
310
    }
311
312
    /**
313
     * Returns elements by tag name.
314
     *
315
     * @param string   $name
316
     * @param int|null $idx
317
     *
318
     * @return SimpleXmlDomInterface|SimpleXmlDomInterface[]|SimpleXmlDomNodeInterface
319
     */
320 View Code Duplication
    public function getElementsByTagName(string $name, $idx = null)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
321
    {
322
        $nodesList = $this->document->getElementsByTagName($name);
323
324
        $elements = new SimpleXmlDomNode();
325
326
        foreach ($nodesList as $node) {
327
            $elements[] = new SimpleXmlDom($node);
328
        }
329
330
        // return all elements
331
        if ($idx === null) {
332
            if (\count($elements) === 0) {
333
                return new SimpleXmlDomNodeBlank();
334
            }
335
336
            return $elements;
337
        }
338
339
        // handle negative values
340
        if ($idx < 0) {
341
            $idx = \count($elements) + $idx;
342
        }
343
344
        // return one element
345
        return $elements[$idx] ?? new SimpleXmlDomNodeBlank();
346
    }
347
348
    /**
349
     * Get dom node's outer html.
350
     *
351
     * @param bool $multiDecodeNewHtmlEntity
352
     *
353
     * @return string
354
     */
355
    public function html(bool $multiDecodeNewHtmlEntity = false): string
356
    {
357
        if ($this::$callback !== null) {
358
            \call_user_func($this::$callback, [$this]);
359
        }
360
361
        $content = $this->document->saveHTML();
362
363
        if ($content === false) {
364
            return '';
365
        }
366
367
        return $this->fixHtmlOutput($content, $multiDecodeNewHtmlEntity);
368
    }
369
370
    /**
371
     * Load HTML from string.
372
     *
373
     * @param string   $html
374
     * @param int|null $libXMLExtraOptions
375
     *
376
     * @return self
377
     */
378
    public function loadHtml(string $html, $libXMLExtraOptions = null): DomParserInterface
379
    {
380
        $this->document = $this->createDOMDocument($html, $libXMLExtraOptions);
381
382
        return $this;
0 ignored issues
show
Bug Best Practice introduced by
The return type of return $this; (voku\helper\XmlDomParser) is incompatible with the return type declared by the interface voku\helper\DomParserInterface::loadHtml of type self.

If you return a value from a function or method, it should be a sub-type of the type that is given by the parent type f.e. an interface, or abstract method. This is more formally defined by the Lizkov substitution principle, and guarantees that classes that depend on the parent type can use any instance of a child type interchangably. This principle also belongs to the SOLID principles for object oriented design.

Let’s take a look at an example:

class Author {
    private $name;

    public function __construct($name) {
        $this->name = $name;
    }

    public function getName() {
        return $this->name;
    }
}

abstract class Post {
    public function getAuthor() {
        return 'Johannes';
    }
}

class BlogPost extends Post {
    public function getAuthor() {
        return new Author('Johannes');
    }
}

class ForumPost extends Post { /* ... */ }

function my_function(Post $post) {
    echo strtoupper($post->getAuthor());
}

Our function my_function expects a Post object, and outputs the author of the post. The base class Post returns a simple string and outputting a simple string will work just fine. However, the child class BlogPost which is a sub-type of Post instead decided to return an object, and is therefore violating the SOLID principles. If a BlogPost were passed to my_function, PHP would not complain, but ultimately fail when executing the strtoupper call in its body.

Loading history...
383
    }
384
385
    /**
386
     * Load HTML from file.
387
     *
388
     * @param string   $filePath
389
     * @param int|null $libXMLExtraOptions
390
     *
391
     * @throws \RuntimeException
392
     *
393
     * @return XmlDomParser
394
     */
395 View Code Duplication
    public function loadHtmlFile(string $filePath, $libXMLExtraOptions = null): DomParserInterface
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
396
    {
397
        if (
398
            !\preg_match("/^https?:\/\//i", $filePath)
399
            &&
400
            !\file_exists($filePath)
401
        ) {
402
            throw new \RuntimeException("File ${filePath} not found");
403
        }
404
405
        try {
406
            if (\class_exists('\voku\helper\UTF8')) {
407
                /** @noinspection PhpUndefinedClassInspection */
408
                $html = UTF8::file_get_contents($filePath);
409
            } else {
410
                $html = \file_get_contents($filePath);
411
            }
412
        } catch (\Exception $e) {
413
            throw new \RuntimeException("Could not load file ${filePath}");
414
        }
415
416
        if ($html === false) {
417
            throw new \RuntimeException("Could not load file ${filePath}");
418
        }
419
420
        return $this->loadHtml($html, $libXMLExtraOptions);
0 ignored issues
show
Bug Best Practice introduced by
The return type of return $this->loadHtml($..., $libXMLExtraOptions); (voku\helper\XmlDomParser) is incompatible with the return type declared by the interface voku\helper\DomParserInterface::loadHtmlFile of type self.

If you return a value from a function or method, it should be a sub-type of the type that is given by the parent type f.e. an interface, or abstract method. This is more formally defined by the Lizkov substitution principle, and guarantees that classes that depend on the parent type can use any instance of a child type interchangably. This principle also belongs to the SOLID principles for object oriented design.

Let’s take a look at an example:

class Author {
    private $name;

    public function __construct($name) {
        $this->name = $name;
    }

    public function getName() {
        return $this->name;
    }
}

abstract class Post {
    public function getAuthor() {
        return 'Johannes';
    }
}

class BlogPost extends Post {
    public function getAuthor() {
        return new Author('Johannes');
    }
}

class ForumPost extends Post { /* ... */ }

function my_function(Post $post) {
    echo strtoupper($post->getAuthor());
}

Our function my_function expects a Post object, and outputs the author of the post. The base class Post returns a simple string and outputting a simple string will work just fine. However, the child class BlogPost which is a sub-type of Post instead decided to return an object, and is therefore violating the SOLID principles. If a BlogPost were passed to my_function, PHP would not complain, but ultimately fail when executing the strtoupper call in its body.

Loading history...
421
    }
422
423
    /**
424
     * @param string $selector
425
     * @param int    $idx
426
     *
427
     * @return SimpleXmlDomInterface|SimpleXmlDomInterface[]|SimpleXmlDomNodeInterface
428
     */
429
    public function __invoke($selector, $idx = null)
430
    {
431
        return $this->find($selector, $idx);
432
    }
433
434
    /**
435
     * Load XML from string.
436
     *
437
     * @param string   $xml
438
     * @param int|null $libXMLExtraOptions
439
     *
440
     * @return XmlDomParser
441
     */
442
    public function loadXml(string $xml, $libXMLExtraOptions = null): self
443
    {
444 3
        $this->document = $this->createDOMDocument($xml, $libXMLExtraOptions);
445
446 3
        return $this;
447
    }
448
449
    /**
450
     * Load XML from file.
451
     *
452
     * @param string   $filePath
453
     * @param int|null $libXMLExtraOptions
454
     *
455
     * @throws \RuntimeException
456
     *
457
     * @return XmlDomParser
458
     */
459 View Code Duplication
    public function loadXmlFile(string $filePath, $libXMLExtraOptions = null): self
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
460
    {
461
        if (
462 2
            !\preg_match("/^https?:\/\//i", $filePath)
463
            &&
464 2
            !\file_exists($filePath)
465
        ) {
466
            throw new \RuntimeException("File ${filePath} not found");
467
        }
468
469
        try {
470 2
            if (\class_exists('\voku\helper\UTF8')) {
471
                /** @noinspection PhpUndefinedClassInspection */
472
                $xml = UTF8::file_get_contents($filePath);
473
            } else {
474 2
                $xml = \file_get_contents($filePath);
475
            }
476
        } catch (\Exception $e) {
477
            throw new \RuntimeException("Could not load file ${filePath}");
478
        }
479
480 2
        if ($xml === false) {
481
            throw new \RuntimeException("Could not load file ${filePath}");
482
        }
483
484 2
        return $this->loadXml($xml, $libXMLExtraOptions);
485
    }
486
487
    /**
488
     * @param callable      $callback
489
     * @param \DOMNode|null $domNode
490
     */
491
    public function replaceTextWithCallback($callback, \DOMNode $domNode = null)
492
    {
493 1
        if ($domNode === null) {
494 1
            $domNode = $this->document;
495
        }
496
497 1
        if ($domNode->hasChildNodes()) {
498 1
            $children = [];
499
500
            // since looping through a DOM being modified is a bad idea we prepare an array:
501 1
            foreach ($domNode->childNodes as $child) {
502 1
                $children[] = $child;
503
            }
504
505 1
            foreach ($children as $child) {
506 1
                if ($child->nodeType === \XML_TEXT_NODE) {
507 1
                    $oldText = self::putReplacedBackToPreserveHtmlEntities($child->wholeText);
508 1
                    $newText = $callback($oldText);
509 1
                    if ($domNode->ownerDocument) {
510 1
                        $newTextNode = $domNode->ownerDocument->createTextNode(self::replaceToPreserveHtmlEntities($newText));
511 1
                        $domNode->replaceChild($newTextNode, $child);
512
                    }
513
                } else {
514 1
                    $this->replaceTextWithCallback($callback, $child);
515
                }
516
            }
517
        }
518 1
    }
519
}
520