Completed
Push — master ( bfaa7e...162bcd )
by Lars
04:53 queued 11s
created

SimpleHtmlDom::childNodes()   A

Complexity

Conditions 3
Paths 3

Size

Total Lines 14

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 7
CRAP Score 3

Importance

Changes 0
Metric Value
dl 0
loc 14
ccs 7
cts 7
cp 1
rs 9.7998
c 0
b 0
f 0
cc 3
nc 3
nop 1
crap 3
1
<?php
2
3
declare(strict_types=1);
4
5
namespace voku\helper;
6
7
use BadMethodCallException;
8
use DOMElement;
9
use DOMNode;
10
use RuntimeException;
11
12
/**
13
 * @property string outerText <p>Get dom node's outer html (alias for "outerHtml").</p>
14
 * @property string outerHtml <p>Get dom node's outer html.</p>
15
 * @property string innerText <p>Get dom node's inner html (alias for "innerHtml").</p>
16
 * @property string innerHtml <p>Get dom node's inner html.</p>
17
 * @property-read string plaintext <p>Get dom node's plain text.</p>
18
 * @property-read string tag       <p>Get dom node name.</p>
19
 * @property-read string attr      <p>Get dom node attributes.</p>
20
 *
21
 * @method SimpleHtmlDom|SimpleHtmlDom[]|SimpleHtmlDomNode|null children() children($idx = -1) <p>Returns children of
22
 *         node.</p>
23
 * @method SimpleHtmlDom|null first_child() <p>Returns the first child of node.</p>
24
 * @method SimpleHtmlDom|null last_child() <p>Returns the last child of node.</p>
25
 * @method SimpleHtmlDom|null next_sibling() <p>Returns the next sibling of node.</p>
26
 * @method SimpleHtmlDom|null prev_sibling() <p>Returns the previous sibling of node.</p>
27
 * @method SimpleHtmlDom|null parent() <p>Returns the parent of node.</p>
28
 * @method string outerText() <p>Get dom node's outer html (alias for "outerHtml()").</p>
29
 * @method string outerHtml() <p>Get dom node's outer html.</p>
30
 * @method string innerText() <p>Get dom node's inner html (alias for "innerHtml()").</p>
31
 */
32
class SimpleHtmlDom implements \IteratorAggregate
33
{
34
    /**
35
     * @var array
36
     */
37
    protected static $functionAliases = [
38
        'children'     => 'childNodes',
39
        'first_child'  => 'firstChild',
40
        'last_child'   => 'lastChild',
41
        'next_sibling' => 'nextSibling',
42
        'prev_sibling' => 'previousSibling',
43
        'parent'       => 'parentNode',
44
        'outertext'    => 'html',
45
        'outerhtml'    => 'html',
46
        'innertext'    => 'innerHtml',
47
        'innerhtml'    => 'innerHtml',
48
    ];
49
50
    /**
51
     * @var DOMElement
52
     */
53
    protected $node;
54
55
    /**
56
     * SimpleHtmlDom constructor.
57
     *
58
     * @param DOMNode $node
59
     */
60 100
    public function __construct(DOMNode $node)
61
    {
62 100
        $this->node = $node;
0 ignored issues
show
Documentation Bug introduced by
$node is of type object<DOMNode>, but the property $node was declared to be of type object<DOMElement>. Are you sure that you always receive this specific sub-class here, or does it make sense to add an instanceof check?

Our type inference engine has found a suspicous assignment of a value to a property. This check raises an issue when a value that can be of a given class or a super-class is assigned to a property that is type hinted more strictly.

Either this assignment is in error or an instanceof check should be added for that assignment.

class Alien {}

class Dalek extends Alien {}

class Plot
{
    /** @var  Dalek */
    public $villain;
}

$alien = new Alien();
$plot = new Plot();
if ($alien instanceof Dalek) {
    $plot->villain = $alien;
}
Loading history...
63 100
    }
64
65
    /**
66
     * @param string $name
67
     * @param array  $arguments
68
     *
69
     * @throws \BadMethodCallException
70
     *
71
     * @return SimpleHtmlDom|string|null
72
     */
73 9 View Code Duplication
    public function __call($name, $arguments)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
74
    {
75 9
        $name = \strtolower($name);
76
77 9
        if (isset(self::$functionAliases[$name])) {
78 9
            return \call_user_func_array([$this, self::$functionAliases[$name]], $arguments);
79
        }
80
81
        throw new BadMethodCallException('Method does not exist');
82
    }
83
84
    /**
85
     * @param string $name
86
     *
87
     * @return array|string|null
88
     */
89 44
    public function __get($name)
90
    {
91 44
        $name = \strtolower($name);
92
93
        switch ($name) {
94 44
            case 'outerhtml':
95 40
            case 'outertext':
96 18
                return $this->html();
97 34
            case 'innerhtml':
98 28
            case 'innertext':
99 11
                return $this->innerHtml();
100 25
            case 'text':
101 20
            case 'plaintext':
102 16
                return $this->text();
103 11
            case 'tag':
104 4
                return $this->node->nodeName;
105 10
            case 'attr':
106
                return $this->getAllAttributes();
107
            default:
108 10
                return $this->getAttribute($name);
109
        }
110
    }
111
112
    /**
113
     * @param string $selector
114
     * @param int    $idx
115
     *
116
     * @return SimpleHtmlDom|SimpleHtmlDom[]|SimpleHtmlDomNodeInterface
117
     */
118 12
    public function __invoke($selector, $idx = null)
119
    {
120 12
        return $this->find($selector, $idx);
121
    }
122
123
    /**
124
     * @param $name
125
     *
126
     * @return bool
127
     */
128 1
    public function __isset($name)
129
    {
130 1
        $name = \strtolower($name);
131
132
        switch ($name) {
133 1
            case 'outertext':
134 1
            case 'outerhtml':
135 1
            case 'innertext':
136 1
            case 'innerhtml':
137 1
            case 'plaintext':
138 1
            case 'text':
139 1
            case 'tag':
140
                return true;
141
            default:
142 1
                return $this->hasAttribute($name);
143
        }
144
    }
145
146
    /**
147
     * @param $name
148
     * @param $value
149
     *
150
     * @return SimpleHtmlDom
151
     */
152 14
    public function __set($name, $value)
153
    {
154 14
        $name = \strtolower($name);
155
156
        switch ($name) {
157 14
            case 'outerhtml':
158 13
            case 'outertext':
159 3
                return $this->replaceNode($value);
160 11
            case 'innertext':
161 9
            case 'innerhtml':
162 7
                return $this->replaceChild($value);
163
            default:
164 8
                return $this->setAttribute($name, $value);
165
        }
166
    }
167
168
    /**
169
     * @return string
170
     */
171 2
    public function __toString()
172
    {
173 2
        return $this->html();
174
    }
175
176
    /**
177
     * @param $name
178
     *
179
     * @return SimpleHtmlDom
180
     */
181
    public function __unset($name)
182
    {
183
        return $this->removeAttribute($name);
184
    }
185
186
    /**
187
     * Returns children of node.
188
     *
189
     * @param int $idx
190
     *
191
     * @return SimpleHtmlDom|SimpleHtmlDom[]|SimpleHtmlDomNode|null
192
     */
193 2
    public function childNodes(int $idx = -1)
194
    {
195 2
        $nodeList = $this->getIterator();
196
197 2
        if ($idx === -1) {
198 2
            return $nodeList;
199
        }
200
201 2
        if (isset($nodeList[$idx])) {
202 2
            return $nodeList[$idx];
203
        }
204
205 1
        return null;
206
    }
207
208
    /**
209
     * Find list of nodes with a CSS selector.
210
     *
211
     * @param string   $selector
212
     * @param int|null $idx
213
     *
214
     * @return SimpleHtmlDom|SimpleHtmlDom[]|SimpleHtmlDomNodeInterface
215
     */
216 26
    public function find(string $selector, $idx = null)
217
    {
218 26
        return $this->getHtmlDomParser()->find($selector, $idx);
219
    }
220
221
    /**
222
     * Find one node with a CSS selector.
223
     *
224
     * @param string $selector
225
     *
226
     * @return SimpleHtmlDom|SimpleHtmlDomNodeInterface
227
     */
228
    public function findOne(string $selector)
229
    {
230
        return $this->find($selector, 0);
231
    }
232
233
    /**
234
     * Returns the first child of node.
235
     *
236
     * @return SimpleHtmlDom|null
237
     */
238 4
    public function firstChild()
239
    {
240 4
        $node = $this->node->firstChild;
241
242 4
        if ($node === null) {
243 1
            return null;
244
        }
245
246 4
        return new self($node);
247
    }
248
249
    /**
250
     * Returns an array of attributes.
251
     *
252
     * @return array|null
253
     */
254 2
    public function getAllAttributes()
255
    {
256 2
        if ($this->node->hasAttributes()) {
257 2
            $attributes = [];
258 2
            foreach ($this->node->attributes as $attr) {
259 2
                $attributes[$attr->name] = HtmlDomParser::putReplacedBackToPreserveHtmlEntities($attr->value);
260
            }
261
262 2
            return $attributes;
263
        }
264
265 1
        return null;
266
    }
267
268
    /**
269
     * Return attribute value.
270
     *
271
     * @param string $name
272
     *
273
     * @return string
274
     */
275 13
    public function getAttribute(string $name): string
276
    {
277 13
        $html = $this->node->getAttribute($name);
278
279 13
        return HtmlDomParser::putReplacedBackToPreserveHtmlEntities($html);
280
    }
281
282
    /**
283
     * Return element by #id.
284
     *
285
     * @param string $id
286
     *
287
     * @return SimpleHtmlDom|SimpleHtmlDomNodeInterface
288
     */
289 1
    public function getElementById(string $id)
290
    {
291 1
        return $this->find("#${id}", 0);
292
    }
293
294
    /**
295
     * Return element by tag name.
296
     *
297
     * @param string $name
298
     *
299
     * @return SimpleHtmlDom|SimpleHtmlDomNodeBlank
300
     */
301
    public function getElementByTagName(string $name)
302
    {
303 1
        $node = $this->node->getElementsByTagName($name)->item(0);
304
305 1
        if ($node === null) {
306
            return new SimpleHtmlDomNodeBlank();
307
        }
308
309 1
        return new self($node);
310
    }
311
312
    /**
313
     * Returns elements by #id.
314
     *
315
     * @param string   $id
316
     * @param int|null $idx
317
     *
318
     * @return SimpleHtmlDom|SimpleHtmlDom[]|SimpleHtmlDomNodeInterface
319
     */
320
    public function getElementsById(string $id, $idx = null)
321
    {
322
        return $this->find("#${id}", $idx);
323
    }
324
325
    /**
326
     * Returns elements by tag name.
327
     *
328
     * @param string   $name
329
     * @param int|null $idx
330
     *
331
     * @return SimpleHtmlDom|SimpleHtmlDom[]|SimpleHtmlDomNode|SimpleHtmlDomNodeBlank
332
     */
333 View Code Duplication
    public function getElementsByTagName(string $name, $idx = null)
0 ignored issues
show
Duplication introduced by
This method seems to be duplicated in your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
334
    {
335 1
        $nodesList = $this->node->getElementsByTagName($name);
336
337 1
        $elements = new SimpleHtmlDomNode();
338
339 1
        foreach ($nodesList as $node) {
340 1
            $elements[] = new self($node);
341
        }
342
343
        // return all elements
344 1
        if ($idx === null) {
345 1
            return $elements;
346
        }
347
348
        // handle negative values
349
        if ($idx < 0) {
350
            $idx = \count($elements) + $idx;
351
        }
352
353
        // return one element
354
        if (isset($elements[$idx])) {
355
            return $elements[$idx];
356
        }
357
358
        // return a blank-element
359
        return new SimpleHtmlDomNodeBlank();
360
    }
361
362
    /**
363
     * Create a new "HtmlDomParser"-object from the current context.
364
     *
365
     * @return HtmlDomParser
366
     */
367
    public function getHtmlDomParser(): HtmlDomParser
368
    {
369 64
        return new HtmlDomParser($this);
370
    }
371
372
    /**
373
     * Retrieve an external iterator.
374
     *
375
     * @see  http://php.net/manual/en/iteratoraggregate.getiterator.php
376
     *
377
     * @return SimpleHtmlDomNode An instance of an object implementing <b>Iterator</b> or
378
     * <b>Traversable</b>
379
     */
380
    public function getIterator(): SimpleHtmlDomNode
381
    {
382 2
        $elements = new SimpleHtmlDomNode();
383 2
        if ($this->node->hasChildNodes()) {
384 2
            foreach ($this->node->childNodes as $node) {
385 2
                $elements[] = new self($node);
386
            }
387
        }
388
389 2
        return $elements;
390
    }
391
392
    /**
393
     * @return DOMNode
394
     */
395
    public function getNode(): \DOMNode
396
    {
397 65
        return $this->node;
398
    }
399
400
    /**
401
     * Determine if an attribute exists on the element.
402
     *
403
     * @param string $name
404
     *
405
     * @return bool
406
     */
407
    public function hasAttribute(string $name): bool
408
    {
409 1
        return $this->node->hasAttribute($name);
410
    }
411
412
    /**
413
     * Get dom node's outer html.
414
     *
415
     * @param bool $multiDecodeNewHtmlEntity
416
     *
417
     * @return string
418
     */
419
    public function html(bool $multiDecodeNewHtmlEntity = false): string
420
    {
421 20
        return $this->getHtmlDomParser()->html($multiDecodeNewHtmlEntity);
422
    }
423
424
    /**
425
     * Get dom node's inner html.
426
     *
427
     * @param bool $multiDecodeNewHtmlEntity
428
     *
429
     * @return string
430
     */
431
    public function innerHtml(bool $multiDecodeNewHtmlEntity = false): string
432
    {
433 11
        return $this->getHtmlDomParser()->innerHtml($multiDecodeNewHtmlEntity);
434
    }
435
436
    /**
437
     * Returns the last child of node.
438
     *
439
     * @return SimpleHtmlDom|null
440
     */
441
    public function lastChild()
442
    {
443 4
        $node = $this->node->lastChild;
444
445 4
        if ($node === null) {
446 1
            return null;
447
        }
448
449 4
        return new self($node);
450
    }
451
452
    /**
453
     * Returns the next sibling of node.
454
     *
455
     * @return SimpleHtmlDom|null
456
     */
457
    public function nextSibling()
458
    {
459 1
        $node = $this->node->nextSibling;
460
461 1
        if ($node === null) {
462 1
            return null;
463
        }
464
465 1
        return new self($node);
466
    }
467
468
    /**
469
     * Returns the parent of node.
470
     *
471
     * @return SimpleHtmlDom
472
     */
473
    public function parentNode(): self
474
    {
475 1
        return new self($this->node->parentNode);
476
    }
477
478
    /**
479
     * Returns the previous sibling of node.
480
     *
481
     * @return SimpleHtmlDom|null
482
     */
483
    public function previousSibling()
484
    {
485 1
        $node = $this->node->previousSibling;
486
487 1
        if ($node === null) {
488 1
            return null;
489
        }
490
491 1
        return new self($node);
492
    }
493
494
    /**
495
     * Replace child node.
496
     *
497
     * @param string $string
498
     *
499
     * @throws \RuntimeException
500
     *
501
     * @return $this
502
     */
503
    protected function replaceChild(string $string)
504
    {
505 7
        if (!empty($string)) {
506 6
            $newDocument = new HtmlDomParser($string);
507
508 6
            if ($this->normalizeStringForComparision($newDocument) !== $this->normalizeStringForComparision($string)) {
509
                throw new RuntimeException('Not valid HTML fragment');
510
            }
511
        }
512
513
        /** @noinspection PhpParamsInspection */
514 7
        if (\count($this->node->childNodes) > 0) {
515 7
            foreach ($this->node->childNodes as $node) {
516 7
                $this->node->removeChild($node);
517
            }
518
        }
519
520 7
        if (!empty($newDocument)) {
521 6
            $newDocument = $this->cleanHtmlWrapper($newDocument);
522 6
            $newNode = $this->node->ownerDocument->importNode($newDocument->getDocument()->documentElement, true);
523 6
            $this->node->appendChild($newNode);
524
        }
525
526 7
        return $this;
527
    }
528
529
    /**
530
     * Replace this node.
531
     *
532
     * @param string $string
533
     *
534
     * @throws \RuntimeException
535
     *
536
     * @return $this|null
537
     */
538
    protected function replaceNode(string $string)
539
    {
540 3
        if (empty($string)) {
541 2
            $this->node->parentNode->removeChild($this->node);
542
543 2
            return null;
544
        }
545
546 2
        $newDocument = new HtmlDomParser($string);
547
548 2
        if ($this->normalizeStringForComparision($newDocument->outerText()) !== $this->normalizeStringForComparision($string)) {
549
            throw new RuntimeException('Not valid HTML fragment');
550
        }
551
552 2
        $newDocument = $this->cleanHtmlWrapper($newDocument);
553
554 2
        $newNode = $this->node->ownerDocument->importNode($newDocument->getDocument()->documentElement, true);
555
556 2
        $this->node->parentNode->replaceChild($newNode, $this->node);
557 2
        $this->node = $newNode;
558
559 2
        return $this;
560
    }
561
562
    /**
563
     * Normalize the given input for comparision.
564
     *
565
     * @param HtmlDomParser|string $input
566
     *
567
     * @return string
568
     */
569
    private function normalizeStringForComparision($input): string
570
    {
571 8
        if ($input instanceof HtmlDomParser) {
572 6
            $string = $input->outerText();
573
574 6
            if ($input->getIsDOMDocumentCreatedWithoutHeadWrapper() === true) {
575
                /** @noinspection HtmlRequiredTitleElement */
576 6
                $string = \str_replace(['<head>', '</head>'], '', $string);
577
            }
578
        } else {
579 8
            $string = (string) $input;
580
        }
581
582
        return
583 8
            \urlencode(
584 8
                \urldecode(
585 8
                    \trim(
586 8
                        \str_replace(
587
                            [
588 8
                                ' ',
589
                                "\n",
590
                                "\r",
591
                                '/>',
592
                            ],
593
                            [
594 8
                                '',
595
                                '',
596
                                '',
597
                                '>',
598
                            ],
599 8
                            \strtolower($string)
600
                        )
601
                    )
602
                )
603
            );
604
    }
605
606
    /**
607
     * @param HtmlDomParser $newDocument
608
     *
609
     * @return HtmlDomParser
610
     */
611
    protected function cleanHtmlWrapper(HtmlDomParser $newDocument): HtmlDomParser
612
    {
613
        if (
614 8
            $newDocument->getIsDOMDocumentCreatedWithoutHtml() === true
615
            ||
616 8
            $newDocument->getIsDOMDocumentCreatedWithoutHtmlWrapper() === true
617
        ) {
618
619
            // Remove doc-type node.
620 8
            if ($newDocument->getDocument()->doctype !== null) {
621
                $newDocument->getDocument()->doctype->parentNode->removeChild($newDocument->getDocument()->doctype);
622
            }
623
624
            // Remove html element, preserving child nodes.
625 8
            $html = $newDocument->getDocument()->getElementsByTagName('html')->item(0);
626 8
            $fragment = $newDocument->getDocument()->createDocumentFragment();
627 8
            if ($html !== null) {
628 5
                while ($html->childNodes->length > 0) {
629 5
                    $fragment->appendChild($html->childNodes->item(0));
630
                }
631 5
                $html->parentNode->replaceChild($fragment, $html);
632
            }
633
634
            // Remove body element, preserving child nodes.
635 8
            $body = $newDocument->getDocument()->getElementsByTagName('body')->item(0);
636 8
            $fragment = $newDocument->getDocument()->createDocumentFragment();
637 8
            if ($body instanceof \DOMElement) {
638 4
                while ($body->childNodes->length > 0) {
639 4
                    $fragment->appendChild($body->childNodes->item(0));
640
                }
641 4
                $body->parentNode->replaceChild($fragment, $body);
642
643
                // At this point DOMDocument still added a "<p>"-wrapper around our string,
644
                // so we replace it with "<simpleHtmlDomP>" and delete this at the ending ...
645 4
                $item = $newDocument->getDocument()->getElementsByTagName('p')->item(0);
646 4
                if ($item !== null) {
647 4
                    $this->changeElementName($item, 'simpleHtmlDomP');
648
                }
649
            }
650
        }
651
652 8
        return $newDocument;
653
    }
654
655
    /**
656
     * Change the name of a tag in a "DOMNode".
657
     *
658
     * @param DOMNode $node
659
     * @param string  $name
660
     *
661
     * @return DOMElement
662
     */
663
    protected function changeElementName(\DOMNode $node, string $name): \DOMElement
664
    {
665 4
        $newnode = $node->ownerDocument->createElement($name);
666
667 4
        foreach ($node->childNodes as $child) {
668 4
            $child = $node->ownerDocument->importNode($child, true);
669 4
            $newnode->appendChild($child);
670
        }
671
672 4
        foreach ($node->attributes as $attrName => $attrNode) {
673
            $newnode->setAttribute($attrName, $attrNode);
674
        }
675
676 4
        $newnode->ownerDocument->replaceChild($newnode, $node);
677
678 4
        return $newnode;
679
    }
680
681
    /**
682
     * Set attribute value.
683
     *
684
     * @param string      $name       <p>The name of the html-attribute.</p>
685
     * @param string|null $value      <p>Set to NULL or empty string, to remove the attribute.</p>
686
     * @param bool        $strict     </p>
687
     *                                $value must be NULL, to remove the attribute,
688
     *                                so that you can set an empty string as attribute-value e.g. autofocus=""
689
     *                                </p>
690
     *
691
     * @return $this
692
     */
693
    public function setAttribute(string $name, $value = null, bool $strict = false)
694
    {
695
        if (
696 9
            ($strict === true && $value === null)
697
            ||
698 9
            ($strict === false && empty($value))
699
        ) {
700 2
            $this->node->removeAttribute($name);
701
        } else {
702 9
            $this->node->setAttribute($name, $value);
703
        }
704
705 9
        return $this;
706
    }
707
708
    /**
709
     * Remove attribute.
710
     *
711
     * @param string $name <p>The name of the html-attribute.</p>
712
     *
713
     * @return mixed
714
     */
715
    public function removeAttribute(string $name)
716
    {
717
        $this->node->removeAttribute($name);
718
719
        return $this;
720
    }
721
722
    /**
723
     * Get dom node's plain text.
724
     *
725
     * @return string
726
     */
727
    public function text(): string
728
    {
729 16
        return $this->getHtmlDomParser()->fixHtmlOutput($this->node->textContent);
730
    }
731
}
732