Completed
Push — master ( 6b1a7e...c5b279 )
by Lars
01:23
created

HtmlMin   F

Complexity

Total Complexity 168

Size/Duplication

Total Lines 1446
Duplicated Lines 0 %

Coupling/Cohesion

Components 11
Dependencies 3

Test Coverage

Coverage 91.49%

Importance

Changes 0
Metric Value
wmc 168
lcom 11
cbo 3
dl 0
loc 1446
ccs 344
cts 376
cp 0.9149
rs 0.8
c 0
b 0
f 0

55 Methods

Rating   Name   Duplication   Size   Complexity  
A __construct() 0 6 1
A attachObserverToTheDomLoop() 0 4 1
A notifyObserversAboutDomElementBeforeMinification() 0 6 2
A notifyObserversAboutDomElementAfterMinification() 0 6 2
A doOptimizeAttributes() 0 6 1
A doOptimizeViaHtmlDomParser() 0 6 1
A doRemoveComments() 0 6 1
A doRemoveDefaultAttributes() 0 6 1
A doRemoveDeprecatedAnchorName() 0 6 1
A doRemoveDeprecatedScriptCharsetAttribute() 0 6 1
A doRemoveDeprecatedTypeFromScriptTag() 0 6 1
A doRemoveDeprecatedTypeFromStylesheetLink() 0 6 1
A doRemoveEmptyAttributes() 0 6 1
A doRemoveHttpPrefixFromAttributes() 0 6 1
A isDoSortCssClassNames() 0 4 1
A isDoSortHtmlAttributes() 0 4 1
A isDoRemoveDeprecatedScriptCharsetAttribute() 0 4 1
A isDoRemoveDefaultAttributes() 0 4 1
A isDoRemoveDeprecatedAnchorName() 0 4 1
A isDoRemoveDeprecatedTypeFromStylesheetLink() 0 4 1
A isDoRemoveDeprecatedTypeFromScriptTag() 0 4 1
A isDoRemoveValueFromEmptyInput() 0 4 1
A isDoRemoveEmptyAttributes() 0 4 1
A isDoSumUpWhitespace() 0 4 1
A isDoRemoveSpacesBetweenTags() 0 4 1
A isDoOptimizeViaHtmlDomParser() 0 4 1
A isDoOptimizeAttributes() 0 4 1
A isDoRemoveComments() 0 4 1
A isDoRemoveWhitespaceAroundTags() 0 4 1
A isDoRemoveOmittedQuotes() 0 4 1
A isDoRemoveOmittedHtmlTags() 0 4 1
A isDoRemoveHttpPrefixFromAttributes() 0 4 1
A getDomainsToRemoveHttpPrefixFromAttributes() 0 4 1
A doRemoveOmittedHtmlTags() 0 6 1
A doRemoveOmittedQuotes() 0 6 1
A doRemoveSpacesBetweenTags() 0 6 1
A doRemoveValueFromEmptyInput() 0 6 1
A doRemoveWhitespaceAroundTags() 0 6 1
A doSortCssClassNames() 0 6 1
A doSortHtmlAttributes() 0 6 1
A doSumUpWhitespace() 0 6 1
C domNodeAttributesToString() 0 57 14
F domNodeClosingTagOptional() 0 219 38
D domNodeToString() 0 87 25
A getNextSiblingOfTypeDOMElement() 0 8 3
A isConditionalComment() 0 12 3
B minifyHtmlDom() 0 64 6
B protectTags() 0 53 9
A removeComments() 0 15 3
B removeWhitespaceAroundTags() 0 25 6
A restoreProtectedHtml() 0 11 2
A setDomainsToRemoveHttpPrefixFromAttributes() 0 6 1
C minify() 0 124 9
A sumUpWhitespace() 0 27 5
A useKeepBrokenHtml() 0 6 1

How to fix   Complexity   

Complex Class

Complex classes like HtmlMin often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes. You can also have a look at the cohesion graph to spot any un-connected, or weakly-connected components.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

While breaking up the class, it is a good idea to analyze how other classes use HtmlMin, and based on these observations, apply Extract Interface, too.

1
<?php
2
3
declare(strict_types=1);
4
5
namespace voku\helper;
6
7
/**
8
 * Class HtmlMin
9
 *
10
 * Inspired by:
11
 * - JS: https://github.com/kangax/html-minifier/blob/gh-pages/src/htmlminifier.js
12
 * - PHP: https://github.com/searchturbine/phpwee-php-minifier
13
 * - PHP: https://github.com/WyriHaximus/HtmlCompress
14
 * - PHP: https://github.com/zaininnari/html-minifier
15
 * - PHP: https://github.com/ampaze/PHP-HTML-Minifier
16
 * - Java: https://code.google.com/archive/p/htmlcompressor/
17
 *
18
 * Ideas:
19
 * - http://perfectionkills.com/optimizing-html/
20
 */
21
class HtmlMin
22
{
23
    /**
24
     * @var string
25
     */
26
    private static $regExSpace = "/[[:space:]]{2,}|[\r\n]+/u";
27
28
    /**
29
     * @var array
30
     */
31
    private static $optional_end_tags = [
32
        'html',
33
        'head',
34
        'body',
35
    ];
36
37
    private static $selfClosingTags = [
38
        'area',
39
        'base',
40
        'basefont',
41
        'br',
42
        'col',
43
        'command',
44
        'embed',
45
        'frame',
46
        'hr',
47
        'img',
48
        'input',
49
        'isindex',
50
        'keygen',
51
        'link',
52
        'meta',
53
        'param',
54
        'source',
55
        'track',
56
        'wbr',
57
    ];
58
59
    private static $trimWhitespaceFromTags = [
60
        'article' => '',
61
        'br'      => '',
62
        'div'     => '',
63
        'footer'  => '',
64
        'hr'      => '',
65
        'nav'     => '',
66
        'p'       => '',
67
        'script'  => '',
68
    ];
69
70
    /**
71
     * @var array
72
     */
73
    private static $booleanAttributes = [
74
        'allowfullscreen' => '',
75
        'async'           => '',
76
        'autofocus'       => '',
77
        'autoplay'        => '',
78
        'checked'         => '',
79
        'compact'         => '',
80
        'controls'        => '',
81
        'declare'         => '',
82
        'default'         => '',
83
        'defaultchecked'  => '',
84
        'defaultmuted'    => '',
85
        'defaultselected' => '',
86
        'defer'           => '',
87
        'disabled'        => '',
88
        'enabled'         => '',
89
        'formnovalidate'  => '',
90
        'hidden'          => '',
91
        'indeterminate'   => '',
92
        'inert'           => '',
93
        'ismap'           => '',
94
        'itemscope'       => '',
95
        'loop'            => '',
96
        'multiple'        => '',
97
        'muted'           => '',
98
        'nohref'          => '',
99
        'noresize'        => '',
100
        'noshade'         => '',
101
        'novalidate'      => '',
102
        'nowrap'          => '',
103
        'open'            => '',
104
        'pauseonexit'     => '',
105
        'readonly'        => '',
106
        'required'        => '',
107
        'reversed'        => '',
108
        'scoped'          => '',
109
        'seamless'        => '',
110
        'selected'        => '',
111
        'sortable'        => '',
112
        'truespeed'       => '',
113
        'typemustmatch'   => '',
114
        'visible'         => '',
115
    ];
116
117
    /**
118
     * @var array
119
     */
120
    private static $skipTagsForRemoveWhitespace = [
121
        'code',
122
        'pre',
123
        'script',
124
        'style',
125
        'textarea',
126
    ];
127
128
    /**
129
     * @var array
130
     */
131
    private $protectedChildNodes = [];
132
133
    /**
134
     * @var string
135
     */
136
    private $protectedChildNodesHelper = 'html-min--voku--saved-content';
137
138
    /**
139
     * @var bool
140
     */
141
    private $doOptimizeViaHtmlDomParser = true;
142
143
    /**
144
     * @var bool
145
     */
146
    private $doOptimizeAttributes = true;
147
148
    /**
149
     * @var bool
150
     */
151
    private $doRemoveComments = true;
152
153
    /**
154
     * @var bool
155
     */
156
    private $doRemoveWhitespaceAroundTags = false;
157
158
    /**
159
     * @var bool
160
     */
161
    private $doRemoveOmittedQuotes = true;
162
163
    /**
164
     * @var bool
165
     */
166
    private $doRemoveOmittedHtmlTags = true;
167
168
    /**
169
     * @var bool
170
     */
171
    private $doRemoveHttpPrefixFromAttributes = false;
172
173
    /**
174
     * @var array
175
     */
176
    private $domainsToRemoveHttpPrefixFromAttributes = [
177
        'google.com',
178
        'google.de',
179
    ];
180
181
    /**
182
     * @var bool
183
     */
184
    private $doSortCssClassNames = true;
185
186
    /**
187
     * @var bool
188
     */
189
    private $doSortHtmlAttributes = true;
190
191
    /**
192
     * @var bool
193
     */
194
    private $doRemoveDeprecatedScriptCharsetAttribute = true;
195
196
    /**
197
     * @var bool
198
     */
199
    private $doRemoveDefaultAttributes = false;
200
201
    /**
202
     * @var bool
203
     */
204
    private $doRemoveDeprecatedAnchorName = true;
205
206
    /**
207
     * @var bool
208
     */
209
    private $doRemoveDeprecatedTypeFromStylesheetLink = true;
210
211
    /**
212
     * @var bool
213
     */
214
    private $doRemoveDeprecatedTypeFromScriptTag = true;
215
216
    /**
217
     * @var bool
218
     */
219
    private $doRemoveValueFromEmptyInput = true;
220
221
    /**
222
     * @var bool
223
     */
224
    private $doRemoveEmptyAttributes = true;
225
226
    /**
227
     * @var bool
228
     */
229
    private $doSumUpWhitespace = true;
230
231
    /**
232
     * @var bool
233
     */
234
    private $doRemoveSpacesBetweenTags = false;
235
236
    /**
237
     * @var bool
238
     */
239
    private $keepBrokenHtml = false;
240
241
    /**
242
     * @var bool
243
     */
244
    private $withDocType = false;
245
246
    /**
247
     * @var \SplObjectStorage|HtmlMinDomObserverInterface[]
248
     */
249
    private $domLoopObservers;
250
251
    /**
252
     * HtmlMin constructor.
253
     */
254 45
    public function __construct()
255
    {
256 45
        $this->domLoopObservers = new \SplObjectStorage();
257
258 45
        $this->attachObserverToTheDomLoop(new HtmlMinDomObserverOptimizeAttributes());
259 45
    }
260
261
    /**
262
     * @param HtmlMinDomObserverInterface $observer
263
     *
264
     * @return void
265
     */
266 45
    public function attachObserverToTheDomLoop(HtmlMinDomObserverInterface $observer)
267
    {
268 45
        $this->domLoopObservers->attach($observer);
269 45
    }
270
271
    /**
272
     * @param $domElement SimpleHtmlDom
273
     *
274
     * @return void
275
     */
276 41
    private function notifyObserversAboutDomElementBeforeMinification(SimpleHtmlDom $domElement)
277
    {
278 41
        foreach ($this->domLoopObservers as $observer) {
279 41
            $observer->domElementBeforeMinification($domElement, $this);
280
        }
281 41
    }
282
283 41
    private function notifyObserversAboutDomElementAfterMinification(SimpleHtmlDom $domElement)
284
    {
285 41
        foreach ($this->domLoopObservers as $observer) {
286 41
            $observer->domElementAfterMinification($domElement, $this);
287
        }
288 41
    }
289
290
    /**
291
     * @param bool $doOptimizeAttributes
292
     *
293
     * @return $this
294
     */
295 2
    public function doOptimizeAttributes(bool $doOptimizeAttributes = true): self
296
    {
297 2
        $this->doOptimizeAttributes = $doOptimizeAttributes;
298
299 2
        return $this;
300
    }
301
302
    /**
303
     * @param bool $doOptimizeViaHtmlDomParser
304
     *
305
     * @return $this
306
     */
307 1
    public function doOptimizeViaHtmlDomParser(bool $doOptimizeViaHtmlDomParser = true): self
308
    {
309 1
        $this->doOptimizeViaHtmlDomParser = $doOptimizeViaHtmlDomParser;
310
311 1
        return $this;
312
    }
313
314
    /**
315
     * @param bool $doRemoveComments
316
     *
317
     * @return $this
318
     */
319 3
    public function doRemoveComments(bool $doRemoveComments = true): self
320
    {
321 3
        $this->doRemoveComments = $doRemoveComments;
322
323 3
        return $this;
324
    }
325
326
    /**
327
     * @param bool $doRemoveDefaultAttributes
328
     *
329
     * @return $this
330
     */
331 2
    public function doRemoveDefaultAttributes(bool $doRemoveDefaultAttributes = true): self
332
    {
333 2
        $this->doRemoveDefaultAttributes = $doRemoveDefaultAttributes;
334
335 2
        return $this;
336
    }
337
338
    /**
339
     * @param bool $doRemoveDeprecatedAnchorName
340
     *
341
     * @return $this
342
     */
343 2
    public function doRemoveDeprecatedAnchorName(bool $doRemoveDeprecatedAnchorName = true): self
344
    {
345 2
        $this->doRemoveDeprecatedAnchorName = $doRemoveDeprecatedAnchorName;
346
347 2
        return $this;
348
    }
349
350
    /**
351
     * @param bool $doRemoveDeprecatedScriptCharsetAttribute
352
     *
353
     * @return $this
354
     */
355 2
    public function doRemoveDeprecatedScriptCharsetAttribute(bool $doRemoveDeprecatedScriptCharsetAttribute = true): self
356
    {
357 2
        $this->doRemoveDeprecatedScriptCharsetAttribute = $doRemoveDeprecatedScriptCharsetAttribute;
358
359 2
        return $this;
360
    }
361
362
    /**
363
     * @param bool $doRemoveDeprecatedTypeFromScriptTag
364
     *
365
     * @return $this
366
     */
367 2
    public function doRemoveDeprecatedTypeFromScriptTag(bool $doRemoveDeprecatedTypeFromScriptTag = true): self
368
    {
369 2
        $this->doRemoveDeprecatedTypeFromScriptTag = $doRemoveDeprecatedTypeFromScriptTag;
370
371 2
        return $this;
372
    }
373
374
    /**
375
     * @param bool $doRemoveDeprecatedTypeFromStylesheetLink
376
     *
377
     * @return $this
378
     */
379 2
    public function doRemoveDeprecatedTypeFromStylesheetLink(bool $doRemoveDeprecatedTypeFromStylesheetLink = true): self
380
    {
381 2
        $this->doRemoveDeprecatedTypeFromStylesheetLink = $doRemoveDeprecatedTypeFromStylesheetLink;
382
383 2
        return $this;
384
    }
385
386
    /**
387
     * @param bool $doRemoveEmptyAttributes
388
     *
389
     * @return $this
390
     */
391 2
    public function doRemoveEmptyAttributes(bool $doRemoveEmptyAttributes = true): self
392
    {
393 2
        $this->doRemoveEmptyAttributes = $doRemoveEmptyAttributes;
394
395 2
        return $this;
396
    }
397
398
    /**
399
     * @param bool $doRemoveHttpPrefixFromAttributes
400
     *
401
     * @return $this
402
     */
403 4
    public function doRemoveHttpPrefixFromAttributes(bool $doRemoveHttpPrefixFromAttributes = true): self
404
    {
405 4
        $this->doRemoveHttpPrefixFromAttributes = $doRemoveHttpPrefixFromAttributes;
406
407 4
        return $this;
408
    }
409
410
    /**
411
     * @return bool
412
     */
413 25
    public function isDoSortCssClassNames(): bool
414
    {
415 25
        return $this->doSortCssClassNames;
416
    }
417
418
    /**
419
     * @return bool
420
     */
421 25
    public function isDoSortHtmlAttributes(): bool
422
    {
423 25
        return $this->doSortHtmlAttributes;
424
    }
425
426
    /**
427
     * @return bool
428
     */
429 25
    public function isDoRemoveDeprecatedScriptCharsetAttribute(): bool
430
    {
431 25
        return $this->doRemoveDeprecatedScriptCharsetAttribute;
432
    }
433
434
    /**
435
     * @return bool
436
     */
437 25
    public function isDoRemoveDefaultAttributes(): bool
438
    {
439 25
        return $this->doRemoveDefaultAttributes;
440
    }
441
442
    /**
443
     * @return bool
444
     */
445 25
    public function isDoRemoveDeprecatedAnchorName(): bool
446
    {
447 25
        return $this->doRemoveDeprecatedAnchorName;
448
    }
449
450
    /**
451
     * @return bool
452
     */
453 25
    public function isDoRemoveDeprecatedTypeFromStylesheetLink(): bool
454
    {
455 25
        return $this->doRemoveDeprecatedTypeFromStylesheetLink;
456
    }
457
458
    /**
459
     * @return bool
460
     */
461 25
    public function isDoRemoveDeprecatedTypeFromScriptTag(): bool
462
    {
463 25
        return $this->doRemoveDeprecatedTypeFromScriptTag;
464
    }
465
466
    /**
467
     * @return bool
468
     */
469 25
    public function isDoRemoveValueFromEmptyInput(): bool
470
    {
471 25
        return $this->doRemoveValueFromEmptyInput;
472
    }
473
474
    /**
475
     * @return bool
476
     */
477 25
    public function isDoRemoveEmptyAttributes(): bool
478
    {
479 25
        return $this->doRemoveEmptyAttributes;
480
    }
481
482
    /**
483
     * @return bool
484
     */
485
    public function isDoSumUpWhitespace(): bool
486
    {
487
        return $this->doSumUpWhitespace;
488
    }
489
490
    /**
491
     * @return bool
492
     */
493
    public function isDoRemoveSpacesBetweenTags(): bool
494
    {
495
        return $this->doRemoveSpacesBetweenTags;
496
    }
497
498
    /**
499
     * @return bool
500
     */
501
    public function isDoOptimizeViaHtmlDomParser(): bool
502
    {
503
        return $this->doOptimizeViaHtmlDomParser;
504
    }
505
506
    /**
507
     * @return bool
508
     */
509
    public function isDoOptimizeAttributes(): bool
510
    {
511
        return $this->doOptimizeAttributes;
512
    }
513
514
    /**
515
     * @return bool
516
     */
517
    public function isDoRemoveComments(): bool
518
    {
519
        return $this->doRemoveComments;
520
    }
521
522
    /**
523
     * @return bool
524
     */
525
    public function isDoRemoveWhitespaceAroundTags(): bool
526
    {
527
        return $this->doRemoveWhitespaceAroundTags;
528
    }
529
530
    /**
531
     * @return bool
532
     */
533
    public function isDoRemoveOmittedQuotes(): bool
534
    {
535
        return $this->doRemoveOmittedQuotes;
536
    }
537
538
    /**
539
     * @return bool
540
     */
541
    public function isDoRemoveOmittedHtmlTags(): bool
542
    {
543
        return $this->doRemoveOmittedHtmlTags;
544
    }
545
546
    /**
547
     * @return bool
548
     */
549 25
    public function isDoRemoveHttpPrefixFromAttributes(): bool
550
    {
551 25
        return $this->doRemoveHttpPrefixFromAttributes;
552
    }
553
554
    /**
555
     * @return array
556
     */
557
    public function getDomainsToRemoveHttpPrefixFromAttributes(): array
558
    {
559
        return $this->domainsToRemoveHttpPrefixFromAttributes;
560
    }
561
562
    /**
563
     * @param bool $doRemoveOmittedHtmlTags
564
     *
565
     * @return $this
566
     */
567 1
    public function doRemoveOmittedHtmlTags(bool $doRemoveOmittedHtmlTags = true): self
568
    {
569 1
        $this->doRemoveOmittedHtmlTags = $doRemoveOmittedHtmlTags;
570
571 1
        return $this;
572
    }
573
574
    /**
575
     * @param bool $doRemoveOmittedQuotes
576
     *
577
     * @return $this
578
     */
579 1
    public function doRemoveOmittedQuotes(bool $doRemoveOmittedQuotes = true): self
580
    {
581 1
        $this->doRemoveOmittedQuotes = $doRemoveOmittedQuotes;
582
583 1
        return $this;
584
    }
585
586
    /**
587
     * @param bool $doRemoveSpacesBetweenTags
588
     *
589
     * @return $this
590
     */
591
    public function doRemoveSpacesBetweenTags(bool $doRemoveSpacesBetweenTags = true): self
592
    {
593
        $this->doRemoveSpacesBetweenTags = $doRemoveSpacesBetweenTags;
594
595
        return $this;
596
    }
597
598
    /**
599
     * @param bool $doRemoveValueFromEmptyInput
600
     *
601
     * @return $this
602
     */
603 2
    public function doRemoveValueFromEmptyInput(bool $doRemoveValueFromEmptyInput = true): self
604
    {
605 2
        $this->doRemoveValueFromEmptyInput = $doRemoveValueFromEmptyInput;
606
607 2
        return $this;
608
    }
609
610
    /**
611
     * @param bool $doRemoveWhitespaceAroundTags
612
     *
613
     * @return $this
614
     */
615 4
    public function doRemoveWhitespaceAroundTags(bool $doRemoveWhitespaceAroundTags = true): self
616
    {
617 4
        $this->doRemoveWhitespaceAroundTags = $doRemoveWhitespaceAroundTags;
618
619 4
        return $this;
620
    }
621
622
    /**
623
     * @param bool $doSortCssClassNames
624
     *
625
     * @return $this
626
     */
627 2
    public function doSortCssClassNames(bool $doSortCssClassNames = true): self
628
    {
629 2
        $this->doSortCssClassNames = $doSortCssClassNames;
630
631 2
        return $this;
632
    }
633
634
    /**
635
     * @param bool $doSortHtmlAttributes
636
     *
637
     * @return $this
638
     */
639 2
    public function doSortHtmlAttributes(bool $doSortHtmlAttributes = true): self
640
    {
641 2
        $this->doSortHtmlAttributes = $doSortHtmlAttributes;
642
643 2
        return $this;
644
    }
645
646
    /**
647
     * @param bool $doSumUpWhitespace
648
     *
649
     * @return $this
650
     */
651 2
    public function doSumUpWhitespace(bool $doSumUpWhitespace = true): self
652
    {
653 2
        $this->doSumUpWhitespace = $doSumUpWhitespace;
654
655 2
        return $this;
656
    }
657
658 41
    private function domNodeAttributesToString(\DOMNode $node): string
659
    {
660
        // Remove quotes around attribute values, when allowed (<p class="foo"> → <p class=foo>)
661 41
        $attr_str = '';
662 41
        if ($node->attributes !== null) {
663 41
            foreach ($node->attributes as $attribute) {
664 25
                $attr_str .= $attribute->name;
665
666
                if (
667 25
                    $this->doOptimizeAttributes
668
                    &&
669 25
                    isset(self::$booleanAttributes[$attribute->name])
670
                ) {
671 8
                    $attr_str .= ' ';
672
673 8
                    continue;
674
                }
675
676 25
                $attr_str .= '=';
677
678
                // http://www.whatwg.org/specs/web-apps/current-work/multipage/syntax.html#attributes-0
679 25
                $omit_quotes = $this->doRemoveOmittedQuotes
680
                               &&
681 25
                               $attribute->value !== ''
682
                               &&
683 25
                               \preg_match('/["\'=<>` \t\r\n\f]+/', $attribute->value) === 0;
684
685 25
                $quoteTmp = '"';
686
                if (
687 25
                    !$omit_quotes
688
                    &&
689 25
                    strpos($attribute->value, '"') !== false
690
                ) {
691 1
                    $quoteTmp = "'";
692
                }
693
694
                if (
695 25
                    $this->doOptimizeAttributes
696
                    &&
697
                    (
698 24
                        $attribute->name === 'srcset'
699
                        ||
700 25
                        $attribute->name === 'sizes'
701
                    )
702
                ) {
703 1
                    $attr_val = \preg_replace(self::$regExSpace, ' ', $attribute->value);
704
                } else {
705 25
                    $attr_val = $attribute->value;
706
                }
707
708 25
                $attr_str .= ($omit_quotes ? '' : $quoteTmp) . $attr_val . ($omit_quotes ? '' : $quoteTmp);
709 25
                $attr_str .= ' ';
710
            }
711
        }
712
713 41
        return \trim($attr_str);
714
    }
715
716
    /**
717
     * @param \DOMNode $node
718
     *
719
     * @return bool
720
     */
721 40
    private function domNodeClosingTagOptional(\DOMNode $node): bool
722
    {
723 40
        $tag_name = $node->nodeName;
724 40
        $nextSibling = $this->getNextSiblingOfTypeDOMElement($node);
725
726
        // https://html.spec.whatwg.org/multipage/syntax.html#syntax-tag-omission
727
728
        // Implemented:
729
        //
730
        // A <p> element's end tag may be omitted if the p element is immediately followed by an address, article, aside, blockquote, details, div, dl, fieldset, figcaption, figure, footer, form, h1, h2, h3, h4, h5, h6, header, hgroup, hr, main, menu, nav, ol, p, pre, section, table, or ul element, or if there is no more content in the parent element and the parent element is an HTML element that is not an a, audio, del, ins, map, noscript, or video element, or an autonomous custom element.
731
        // An <li> element's end tag may be omitted if the li element is immediately followed by another li element or if there is no more content in the parent element.
732
        // A <td> element's end tag may be omitted if the td element is immediately followed by a td or th element, or if there is no more content in the parent element.
733
        // An <option> element's end tag may be omitted if the option element is immediately followed by another option element, or if it is immediately followed by an optgroup element, or if there is no more content in the parent element.
734
        // A <tr> element's end tag may be omitted if the tr element is immediately followed by another tr element, or if there is no more content in the parent element.
735
        // A <th> element's end tag may be omitted if the th element is immediately followed by a td or th element, or if there is no more content in the parent element.
736
        // A <dt> element's end tag may be omitted if the dt element is immediately followed by another dt element or a dd element.
737
        // A <dd> element's end tag may be omitted if the dd element is immediately followed by another dd element or a dt element, or if there is no more content in the parent element.
738
        // An <rp> element's end tag may be omitted if the rp element is immediately followed by an rt or rp element, or if there is no more content in the parent element.
739
740
        // TODO:
741
        //
742
        // <html> may be omitted if first thing inside is not comment
743
        // <head> may be omitted if first thing inside is an element
744
        // <body> may be omitted if first thing inside is not space, comment, <meta>, <link>, <script>, <style> or <template>
745
        // <colgroup> may be omitted if first thing inside is <col>
746
        // <tbody> may be omitted if first thing inside is <tr>
747
        // An <optgroup> element's end tag may be omitted if the optgroup element is immediately followed by another optgroup element, or if there is no more content in the parent element.
748
        // A <colgroup> element's start tag may be omitted if the first thing inside the colgroup element is a col element, and if the element is not immediately preceded by another colgroup element whose end tag has been omitted. (It can't be omitted if the element is empty.)
749
        // A <colgroup> element's end tag may be omitted if the colgroup element is not immediately followed by ASCII whitespace or a comment.
750
        // A <caption> element's end tag may be omitted if the caption element is not immediately followed by ASCII whitespace or a comment.
751
        // A <thead> element's end tag may be omitted if the thead element is immediately followed by a tbody or tfoot element.
752
        // A <tbody> element's start tag may be omitted if the first thing inside the tbody element is a tr element, and if the element is not immediately preceded by a tbody, thead, or tfoot element whose end tag has been omitted. (It can't be omitted if the element is empty.)
753
        // A <tbody> element's end tag may be omitted if the tbody element is immediately followed by a tbody or tfoot element, or if there is no more content in the parent element.
754
        // A <tfoot> element's end tag may be omitted if there is no more content in the parent element.
755
        //
756
        // <-- However, a start tag must never be omitted if it has any attributes.
757
758 40
        return \in_array($tag_name, self::$optional_end_tags, true)
759
               ||
760
               (
761 37
                   $tag_name === 'li'
762
                   &&
763
                   (
764 5
                       $nextSibling === null
765
                       ||
766
                       (
767 3
                           $nextSibling instanceof \DOMElement
768
                           &&
769 37
                           $nextSibling->tagName === 'li'
770
                       )
771
                   )
772
               )
773
               ||
774
               (
775
                   (
776 37
                       $tag_name === 'rp'
777
                   )
778
                   &&
779
                   (
780
                       $nextSibling === null
781
                       ||
782
                       (
783
                           $nextSibling instanceof \DOMElement
784
                           &&
785
                           (
786
                               $nextSibling->tagName === 'rp'
787
                               ||
788 37
                               $nextSibling->tagName === 'rt'
789
                           )
790
                       )
791
                   )
792
               )
793
               ||
794
               (
795 37
                   $tag_name === 'tr'
796
                   &&
797
                   (
798 1
                       $nextSibling === null
799
                       ||
800
                       (
801 1
                           $nextSibling instanceof \DOMElement
802
                           &&
803 37
                           $nextSibling->tagName === 'tr'
804
                       )
805
                   )
806
               )
807
               ||
808
               (
809
                   (
810 37
                       $tag_name === 'td'
811
                       ||
812 37
                       $tag_name === 'th'
813
                   )
814
                   &&
815
                   (
816 1
                       $nextSibling === null
817
                       ||
818
                       (
819 1
                           $nextSibling instanceof \DOMElement
820
                           &&
821
                           (
822 1
                               $nextSibling->tagName === 'td'
823
                               ||
824 37
                               $nextSibling->tagName === 'th'
825
                           )
826
                       )
827
                   )
828
               )
829
               ||
830
               (
831
                   (
832 37
                       $tag_name === 'dd'
833
                       ||
834 37
                       $tag_name === 'dt'
835
                   )
836
                   &&
837
                   (
838
                       (
839 3
                           $nextSibling === null
840
                           &&
841 3
                           $tag_name === 'dd'
842
                       )
843
                       ||
844
                       (
845 3
                           $nextSibling instanceof \DOMElement
846
                           &&
847
                           (
848 3
                               $nextSibling->tagName === 'dd'
849
                               ||
850 37
                               $nextSibling->tagName === 'dt'
851
                           )
852
                       )
853
                   )
854
               )
855
               ||
856
               (
857 37
                   $tag_name === 'option'
858
                   &&
859
                   (
860
                       $nextSibling === null
861
                       ||
862
                       (
863
                           $nextSibling instanceof \DOMElement
864
                           &&
865
                           (
866
                               $nextSibling->tagName === 'option'
867
                               ||
868 37
                               $nextSibling->tagName === 'optgroup'
869
                           )
870
                       )
871
                   )
872
               )
873
               ||
874
               (
875 37
                   $tag_name === 'p'
876
                   &&
877
                   (
878
                       (
879 12
                           $nextSibling === null
880
                           &&
881
                           (
882 10
                               $node->parentNode !== null
883
                               &&
884 10
                               !\in_array(
885 10
                                   $node->parentNode->nodeName,
886
                                   [
887 10
                                       'a',
888
                                       'audio',
889
                                       'del',
890
                                       'ins',
891
                                       'map',
892
                                       'noscript',
893
                                       'video',
894
                                   ],
895 10
                                   true
896
                               )
897
                           )
898
                       )
899
                       ||
900
                       (
901 9
                           $nextSibling instanceof \DOMElement
902
                           &&
903 9
                           \in_array(
904 9
                               $nextSibling->tagName,
905
                               [
906 9
                                   'address',
907
                                   'article',
908
                                   'aside',
909
                                   'blockquote',
910
                                   'dir',
911
                                   'div',
912
                                   'dl',
913
                                   'fieldset',
914
                                   'footer',
915
                                   'form',
916
                                   'h1',
917
                                   'h2',
918
                                   'h3',
919
                                   'h4',
920
                                   'h5',
921
                                   'h6',
922
                                   'header',
923
                                   'hgroup',
924
                                   'hr',
925
                                   'menu',
926
                                   'nav',
927
                                   'ol',
928
                                   'p',
929
                                   'pre',
930
                                   'section',
931
                                   'table',
932
                                   'ul',
933
                               ],
934 40
                               true
935
                           )
936
                       )
937
                   )
938
               );
939
    }
940
941 41
    protected function domNodeToString(\DOMNode $node): string
942
    {
943
        // init
944 41
        $html = '';
945 41
        $emptyStringTmp = '';
946
947 41
        foreach ($node->childNodes as $child) {
948 41
            if ($emptyStringTmp === 'is_empty') {
949 24
                $emptyStringTmp = 'last_was_empty';
950
            } else {
951 41
                $emptyStringTmp = '';
952
            }
953
954 41
            if ($child instanceof \DOMDocumentType) {
955
                // add the doc-type only if it wasn't generated by DomDocument
956 11
                if (!$this->withDocType) {
957
                    continue;
958
                }
959
960 11
                if ($child->name) {
961 11
                    if (!$child->publicId && $child->systemId) {
962
                        $tmpTypeSystem = 'SYSTEM';
963
                        $tmpTypePublic = '';
964
                    } else {
965 11
                        $tmpTypeSystem = '';
966 11
                        $tmpTypePublic = 'PUBLIC';
967
                    }
968
969 11
                    $html .= '<!DOCTYPE ' . $child->name . ''
970 11
                             . ($child->publicId ? ' ' . $tmpTypePublic . ' "' . $child->publicId . '"' : '')
971 11
                             . ($child->systemId ? ' ' . $tmpTypeSystem . ' "' . $child->systemId . '"' : '')
972 11
                             . '>';
973
                }
974 41
            } elseif ($child instanceof \DOMElement) {
975 41
                $html .= \rtrim('<' . $child->tagName . ' ' . $this->domNodeAttributesToString($child));
976 41
                $html .= '>' . $this->domNodeToString($child);
977
978
                if (
979 41
                    !$this->doRemoveOmittedHtmlTags
980
                    ||
981 41
                    !$this->domNodeClosingTagOptional($child)
982
                ) {
983 36
                    $html .= '</' . $child->tagName . '>';
984
                }
985
986 41
                if (!$this->doRemoveWhitespaceAroundTags) {
987
                    if (
988 40
                        $child->nextSibling instanceof \DOMText
989
                        &&
990 40
                        $child->nextSibling->wholeText === ' '
991
                    ) {
992
                        if (
993 23
                            $emptyStringTmp !== 'last_was_empty'
994
                            &&
995 23
                            \substr($html, -1) !== ' '
996
                        ) {
997 23
                            $html .= ' ';
998
                        }
999 41
                        $emptyStringTmp = 'is_empty';
1000
                    }
1001
                }
1002 37
            } elseif ($child instanceof \DOMText) {
1003 37
                if ($child->isElementContentWhitespace()) {
1004
                    if (
1005 26
                        $child->previousSibling !== null
1006
                        &&
1007 26
                        $child->nextSibling !== null
1008
                    ) {
1009
                        if (
1010 18
                            $emptyStringTmp !== 'last_was_empty'
1011
                            &&
1012 18
                            \substr($html, -1) !== ' '
1013
                        ) {
1014 5
                            $html .= ' ';
1015
                        }
1016 26
                        $emptyStringTmp = 'is_empty';
1017
                    }
1018
                } else {
1019 37
                    $html .= $child->wholeText;
1020
                }
1021 1
            } elseif ($child instanceof \DOMComment) {
1022 41
                $html .= '<!--' . $child->textContent . '-->';
1023
            }
1024
        }
1025
1026 41
        return $html;
1027
    }
1028
1029
    /**
1030
     * @param \DOMNode $node
1031
     *
1032
     * @return \DOMNode|null
1033
     */
1034 40
    protected function getNextSiblingOfTypeDOMElement(\DOMNode $node)
1035
    {
1036
        do {
1037 40
            $node = $node->nextSibling;
1038 40
        } while (!($node === null || $node instanceof \DOMElement));
1039
1040 40
        return $node;
1041
    }
1042
1043
    /**
1044
     * Check if the current string is an conditional comment.
1045
     *
1046
     * INFO: since IE >= 10 conditional comment are not working anymore
1047
     *
1048
     * <!--[if expression]> HTML <![endif]-->
1049
     * <![if expression]> HTML <![endif]>
1050
     *
1051
     * @param string $comment
1052
     *
1053
     * @return bool
1054
     */
1055 4
    private function isConditionalComment($comment): bool
1056
    {
1057 4
        if (\preg_match('/^\[if [^\]]+\]/', $comment)) {
1058 2
            return true;
1059
        }
1060
1061 4
        if (\preg_match('/\[endif\]$/', $comment)) {
1062 1
            return true;
1063
        }
1064
1065 4
        return false;
1066
    }
1067
1068
    /**
1069
     * @param string $html
1070
     * @param bool   $decodeUtf8Specials <p>Use this only in special cases, e.g. for PHP 5.3</p>
1071
     *
1072
     * @return string
1073
     */
1074 45
    public function minify($html, $decodeUtf8Specials = false): string
1075
    {
1076 45
        $html = (string) $html;
1077 45
        if (!isset($html[0])) {
1078 1
            return '';
1079
        }
1080
1081 45
        $html = \trim($html);
1082 45
        if (!$html) {
1083 3
            return '';
1084
        }
1085
1086
        // init
1087 42
        static $CACHE_SELF_CLOSING_TAGS = null;
1088 42
        if ($CACHE_SELF_CLOSING_TAGS === null) {
1089 1
            $CACHE_SELF_CLOSING_TAGS = \implode('|', self::$selfClosingTags);
1090
        }
1091
1092
        // reset
1093 42
        $this->protectedChildNodes = [];
1094
1095
        // save old content
1096 42
        $origHtml = $html;
1097 42
        $origHtmlLength = \strlen($html);
1098
1099
        // -------------------------------------------------------------------------
1100
        // Minify the HTML via "HtmlDomParser"
1101
        // -------------------------------------------------------------------------
1102
1103 42
        if ($this->doOptimizeViaHtmlDomParser) {
1104 41
            $html = $this->minifyHtmlDom($html, $decodeUtf8Specials);
1105
        }
1106
1107
        // -------------------------------------------------------------------------
1108
        // Trim whitespace from html-string. [protected html is still protected]
1109
        // -------------------------------------------------------------------------
1110
1111
        // Remove extra white-space(s) between HTML attribute(s)
1112 42
        $html = (string) \preg_replace_callback(
1113 42
            '#<([^/\s<>!]+)(?:\s+([^<>]*?)\s*|\s*)(/?)>#',
1114 42
            function ($matches) {
1115 42
                return '<' . $matches[1] . (string) \preg_replace('#([^\s=]+)(\=([\'"]?)(.*?)\3)?(\s+|$)#s', ' $1$2', $matches[2]) . $matches[3] . '>';
1116 42
            },
1117 42
            $html
1118
        );
1119
1120 42
        if ($this->doRemoveSpacesBetweenTags) {
1121
            // Remove spaces that are between > and <
1122
            $html = (string) \preg_replace('/(>) (<)/', '>$2', $html);
1123
        }
1124
1125
        // -------------------------------------------------------------------------
1126
        // Restore protected HTML-code.
1127
        // -------------------------------------------------------------------------
1128
1129 42
        $html = (string) \preg_replace_callback(
1130 42
            '/<(?<element>' . $this->protectedChildNodesHelper . ')(?<attributes> [^>]*)?>(?<value>.*?)<\/' . $this->protectedChildNodesHelper . '>/',
1131 42
            [$this, 'restoreProtectedHtml'],
1132 42
            $html
1133
        );
1134
1135
        // -------------------------------------------------------------------------
1136
        // Restore protected HTML-entities.
1137
        // -------------------------------------------------------------------------
1138
1139 42
        if ($this->doOptimizeViaHtmlDomParser) {
1140 41
            $html = HtmlDomParser::putReplacedBackToPreserveHtmlEntities($html);
1141
        }
1142
1143
        // ------------------------------------
1144
        // Final clean-up
1145
        // ------------------------------------
1146
1147 42
        $html = \str_replace(
1148
            [
1149 42
                'html>' . "\n",
1150
                "\n" . '<html',
1151
                'html/>' . "\n",
1152
                "\n" . '</html',
1153
                'head>' . "\n",
1154
                "\n" . '<head',
1155
                'head/>' . "\n",
1156
                "\n" . '</head',
1157
            ],
1158
            [
1159 42
                'html>',
1160
                '<html',
1161
                'html/>',
1162
                '</html',
1163
                'head>',
1164
                '<head',
1165
                'head/>',
1166
                '</head',
1167
            ],
1168 42
            $html
1169
        );
1170
1171
        // self closing tags, don't need a trailing slash ...
1172 42
        $replace = [];
1173 42
        $replacement = [];
1174 42
        foreach (self::$selfClosingTags as $selfClosingTag) {
1175 42
            $replace[] = '<' . $selfClosingTag . '/>';
1176 42
            $replacement[] = '<' . $selfClosingTag . '>';
1177 42
            $replace[] = '<' . $selfClosingTag . ' />';
1178 42
            $replacement[] = '<' . $selfClosingTag . '>';
1179
        }
1180 42
        $html = \str_replace(
1181 42
            $replace,
1182 42
            $replacement,
1183 42
            $html
1184
        );
1185
1186 42
        $html = (string) \preg_replace('#<\b(' . $CACHE_SELF_CLOSING_TAGS . ')([^>]*+)><\/\b\1>#', '<\\1\\2>', $html);
1187
1188
        // ------------------------------------
1189
        // check if compression worked
1190
        // ------------------------------------
1191
1192 42
        if ($origHtmlLength < \strlen($html)) {
1193 3
            $html = $origHtml;
1194
        }
1195
1196 42
        return $html;
1197
    }
1198
1199
    /**
1200
     * @param $html
1201
     * @param $decodeUtf8Specials
1202
     *
1203
     * @return string
1204
     */
1205 41
    private function minifyHtmlDom($html, $decodeUtf8Specials): string
1206
    {
1207
        // init dom
1208 41
        $dom = new HtmlDomParser();
1209
        /** @noinspection UnusedFunctionResultInspection */
1210 41
        $dom->useKeepBrokenHtml($this->keepBrokenHtml);
1211
1212 41
        $dom->getDocument()->preserveWhiteSpace = false; // remove redundant white space
1213 41
        $dom->getDocument()->formatOutput = false; // do not formats output with indentation
1214
1215
        // load dom
1216
        /** @noinspection UnusedFunctionResultInspection */
1217 41
        $dom->loadHtml($html);
1218
1219 41
        $this->withDocType = (\stripos(\ltrim($html), '<!DOCTYPE') === 0);
1220
1221 41
        foreach ($dom->find('*') as $element) {
0 ignored issues
show
Bug introduced by
The expression $dom->find('*') of type object<voku\helper\Simpl...leHtmlDomNodeInterface> is not guaranteed to be traversable. How about adding an additional type check?

There are different options of fixing this problem.

  1. If you want to be on the safe side, you can add an additional type-check:

    $collection = json_decode($data, true);
    if ( ! is_array($collection)) {
        throw new \RuntimeException('$collection must be an array.');
    }
    
    foreach ($collection as $item) { /** ... */ }
    
  2. If you are sure that the expression is traversable, you might want to add a doc comment cast to improve IDE auto-completion and static analysis:

    /** @var array $collection */
    $collection = json_decode($data, true);
    
    foreach ($collection as $item) { /** .. */ }
    
  3. Mark the issue as a false-positive: Just hover the remove button, in the top-right corner of this issue for more options.

Loading history...
1222 41
            $this->notifyObserversAboutDomElementBeforeMinification($element);
1223
        }
1224
1225
        // -------------------------------------------------------------------------
1226
        // Protect HTML tags and conditional comments.
1227
        // -------------------------------------------------------------------------
1228
1229 41
        $dom = $this->protectTags($dom);
1230
1231
        // -------------------------------------------------------------------------
1232
        // Remove default HTML comments. [protected html is still protected]
1233
        // -------------------------------------------------------------------------
1234
1235 41
        if ($this->doRemoveComments) {
1236 39
            $dom = $this->removeComments($dom);
1237
        }
1238
1239
        // -------------------------------------------------------------------------
1240
        // Sum-Up extra whitespace from the Dom. [protected html is still protected]
1241
        // -------------------------------------------------------------------------
1242
1243 41
        if ($this->doSumUpWhitespace) {
1244 40
            $dom = $this->sumUpWhitespace($dom);
1245
        }
1246
1247 41
        foreach ($dom->find('*') as $element) {
0 ignored issues
show
Bug introduced by
The expression $dom->find('*') of type object<voku\helper\Simpl...leHtmlDomNodeInterface> is not guaranteed to be traversable. How about adding an additional type check?

There are different options of fixing this problem.

  1. If you want to be on the safe side, you can add an additional type-check:

    $collection = json_decode($data, true);
    if ( ! is_array($collection)) {
        throw new \RuntimeException('$collection must be an array.');
    }
    
    foreach ($collection as $item) { /** ... */ }
    
  2. If you are sure that the expression is traversable, you might want to add a doc comment cast to improve IDE auto-completion and static analysis:

    /** @var array $collection */
    $collection = json_decode($data, true);
    
    foreach ($collection as $item) { /** .. */ }
    
  3. Mark the issue as a false-positive: Just hover the remove button, in the top-right corner of this issue for more options.

Loading history...
1248
1249
            // -------------------------------------------------------------------------
1250
            // Remove whitespace around tags. [protected html is still protected]
1251
            // -------------------------------------------------------------------------
1252
1253 41
            if ($this->doRemoveWhitespaceAroundTags) {
1254 3
                $this->removeWhitespaceAroundTags($element);
1255
            }
1256
1257 41
            $this->notifyObserversAboutDomElementAfterMinification($element);
1258
        }
1259
1260
        // -------------------------------------------------------------------------
1261
        // Convert the Dom into a string.
1262
        // -------------------------------------------------------------------------
1263
1264 41
        return $dom->fixHtmlOutput(
1265 41
            $this->domNodeToString($dom->getDocument()),
1266 41
            $decodeUtf8Specials
1267
        );
1268
    }
1269
1270
    /**
1271
     * Prevent changes of inline "styles" and "scripts".
1272
     *
1273
     * @param HtmlDomParser $dom
1274
     *
1275
     * @return HtmlDomParser
1276
     */
1277 41
    private function protectTags(HtmlDomParser $dom): HtmlDomParser
1278
    {
1279
        // init
1280 41
        $counter = 0;
1281
1282 41
        foreach ($dom->find('script, style') as $element) {
0 ignored issues
show
Bug introduced by
The expression $dom->find('script, style') of type object<voku\helper\Simpl...leHtmlDomNodeInterface> is not guaranteed to be traversable. How about adding an additional type check?

There are different options of fixing this problem.

  1. If you want to be on the safe side, you can add an additional type-check:

    $collection = json_decode($data, true);
    if ( ! is_array($collection)) {
        throw new \RuntimeException('$collection must be an array.');
    }
    
    foreach ($collection as $item) { /** ... */ }
    
  2. If you are sure that the expression is traversable, you might want to add a doc comment cast to improve IDE auto-completion and static analysis:

    /** @var array $collection */
    $collection = json_decode($data, true);
    
    foreach ($collection as $item) { /** .. */ }
    
  3. Mark the issue as a false-positive: Just hover the remove button, in the top-right corner of this issue for more options.

Loading history...
1283
1284
            // skip external links
1285 5
            if ($element->tag === 'script' || $element->tag === 'style') {
1286 5
                $attributes = $element->getAllAttributes();
1287 5
                if (isset($attributes['src'])) {
1288 3
                    continue;
1289
                }
1290
            }
1291
1292 3
            $this->protectedChildNodes[$counter] = $element->text();
1293 3
            $element->getNode()->nodeValue = '<' . $this->protectedChildNodesHelper . ' data-' . $this->protectedChildNodesHelper . '="' . $counter . '"></' . $this->protectedChildNodesHelper . '>';
1294
1295 3
            ++$counter;
1296
        }
1297
1298 41
        foreach ($dom->find('code, nocompress') as $element) {
0 ignored issues
show
Bug introduced by
The expression $dom->find('code, nocompress') of type object<voku\helper\Simpl...leHtmlDomNodeInterface> is not guaranteed to be traversable. How about adding an additional type check?

There are different options of fixing this problem.

  1. If you want to be on the safe side, you can add an additional type-check:

    $collection = json_decode($data, true);
    if ( ! is_array($collection)) {
        throw new \RuntimeException('$collection must be an array.');
    }
    
    foreach ($collection as $item) { /** ... */ }
    
  2. If you are sure that the expression is traversable, you might want to add a doc comment cast to improve IDE auto-completion and static analysis:

    /** @var array $collection */
    $collection = json_decode($data, true);
    
    foreach ($collection as $item) { /** .. */ }
    
  3. Mark the issue as a false-positive: Just hover the remove button, in the top-right corner of this issue for more options.

Loading history...
1299 3
            if ($element->isRemoved()) {
1300 1
                continue;
1301
            }
1302
1303 3
            $this->protectedChildNodes[$counter] = $element->parentNode()->innerHtml();
1304 3
            $element->getNode()->parentNode->nodeValue = '<' . $this->protectedChildNodesHelper . ' data-' . $this->protectedChildNodesHelper . '="' . $counter . '"></' . $this->protectedChildNodesHelper . '>';
1305
1306 3
            ++$counter;
1307
        }
1308
1309 41
        foreach ($dom->find('//comment()') as $element) {
0 ignored issues
show
Bug introduced by
The expression $dom->find('//comment()') of type object<voku\helper\Simpl...leHtmlDomNodeInterface> is not guaranteed to be traversable. How about adding an additional type check?

There are different options of fixing this problem.

  1. If you want to be on the safe side, you can add an additional type-check:

    $collection = json_decode($data, true);
    if ( ! is_array($collection)) {
        throw new \RuntimeException('$collection must be an array.');
    }
    
    foreach ($collection as $item) { /** ... */ }
    
  2. If you are sure that the expression is traversable, you might want to add a doc comment cast to improve IDE auto-completion and static analysis:

    /** @var array $collection */
    $collection = json_decode($data, true);
    
    foreach ($collection as $item) { /** .. */ }
    
  3. Mark the issue as a false-positive: Just hover the remove button, in the top-right corner of this issue for more options.

Loading history...
1310 4
            $text = $element->text();
1311
1312
            // skip normal comments
1313 4
            if (!$this->isConditionalComment($text)) {
1314 4
                continue;
1315
            }
1316
1317 2
            $this->protectedChildNodes[$counter] = '<!--' . $text . '-->';
1318
1319
            /* @var $node \DOMComment */
1320 2
            $node = $element->getNode();
1321 2
            $child = new \DOMText('<' . $this->protectedChildNodesHelper . ' data-' . $this->protectedChildNodesHelper . '="' . $counter . '"></' . $this->protectedChildNodesHelper . '>');
1322
            /** @noinspection UnusedFunctionResultInspection */
1323 2
            $element->getNode()->parentNode->replaceChild($child, $node);
1324
1325 2
            ++$counter;
1326
        }
1327
1328 41
        return $dom;
1329
    }
1330
1331
    /**
1332
     * Remove comments in the dom.
1333
     *
1334
     * @param HtmlDomParser $dom
1335
     *
1336
     * @return HtmlDomParser
1337
     */
1338 39
    private function removeComments(HtmlDomParser $dom): HtmlDomParser
1339
    {
1340 39
        foreach ($dom->find('//comment()') as $commentWrapper) {
0 ignored issues
show
Bug introduced by
The expression $dom->find('//comment()') of type object<voku\helper\Simpl...leHtmlDomNodeInterface> is not guaranteed to be traversable. How about adding an additional type check?

There are different options of fixing this problem.

  1. If you want to be on the safe side, you can add an additional type-check:

    $collection = json_decode($data, true);
    if ( ! is_array($collection)) {
        throw new \RuntimeException('$collection must be an array.');
    }
    
    foreach ($collection as $item) { /** ... */ }
    
  2. If you are sure that the expression is traversable, you might want to add a doc comment cast to improve IDE auto-completion and static analysis:

    /** @var array $collection */
    $collection = json_decode($data, true);
    
    foreach ($collection as $item) { /** .. */ }
    
  3. Mark the issue as a false-positive: Just hover the remove button, in the top-right corner of this issue for more options.

Loading history...
1341 3
            $comment = $commentWrapper->getNode();
1342 3
            $val = $comment->nodeValue;
1343 3
            if (\strpos($val, '[') === false) {
1344
                /** @noinspection UnusedFunctionResultInspection */
1345 3
                $comment->parentNode->removeChild($comment);
1346
            }
1347
        }
1348
1349 39
        $dom->getDocument()->normalizeDocument();
1350
1351 39
        return $dom;
1352
    }
1353
1354
    /**
1355
     * Trim tags in the dom.
1356
     *
1357
     * @param SimpleHtmlDom $element
1358
     *
1359
     * @return void
1360
     */
1361 3
    private function removeWhitespaceAroundTags(SimpleHtmlDom $element)
1362
    {
1363 3
        if (isset(self::$trimWhitespaceFromTags[$element->tag])) {
1364 1
            $node = $element->getNode();
1365
1366
            /** @var \DOMNode[] $candidates */
1367 1
            $candidates = [];
1368 1
            if ($node->childNodes->length > 0) {
1369 1
                $candidates[] = $node->firstChild;
1370 1
                $candidates[] = $node->lastChild;
1371 1
                $candidates[] = $node->previousSibling;
1372 1
                $candidates[] = $node->nextSibling;
1373
            }
1374
1375 1
            foreach ($candidates as &$candidate) {
1376 1
                if ($candidate === null) {
1377
                    continue;
1378
                }
1379
1380 1
                if ($candidate->nodeType === \XML_TEXT_NODE) {
1381 1
                    $candidate->nodeValue = \preg_replace(self::$regExSpace, ' ', $candidate->nodeValue);
1382
                }
1383
            }
1384
        }
1385 3
    }
1386
1387
    /**
1388
     * Callback function for preg_replace_callback use.
1389
     *
1390
     * @param array $matches PREG matches
1391
     *
1392
     * @return string
1393
     */
1394 6
    private function restoreProtectedHtml($matches): string
1395
    {
1396 6
        \preg_match('/.*"(?<id>\d*)"/', $matches['attributes'], $matchesInner);
1397
1398 6
        $html = '';
1399 6
        if (isset($this->protectedChildNodes[$matchesInner['id']])) {
1400 6
            $html .= $this->protectedChildNodes[$matchesInner['id']];
1401
        }
1402
1403 6
        return $html;
1404
    }
1405
1406
    /**
1407
     * @param array $domainsToRemoveHttpPrefixFromAttributes
1408
     *
1409
     * @return $this
1410
     */
1411 2
    public function setDomainsToRemoveHttpPrefixFromAttributes($domainsToRemoveHttpPrefixFromAttributes): self
1412
    {
1413 2
        $this->domainsToRemoveHttpPrefixFromAttributes = $domainsToRemoveHttpPrefixFromAttributes;
1414
1415 2
        return $this;
1416
    }
1417
1418
    /**
1419
     * Sum-up extra whitespace from dom-nodes.
1420
     *
1421
     * @param HtmlDomParser $dom
1422
     *
1423
     * @return HtmlDomParser
1424
     */
1425 40
    private function sumUpWhitespace(HtmlDomParser $dom): HtmlDomParser
1426
    {
1427 40
        $text_nodes = $dom->find('//text()');
1428 40
        foreach ($text_nodes as $text_node_wrapper) {
0 ignored issues
show
Bug introduced by
The expression $text_nodes of type object<voku\helper\Simpl...leHtmlDomNodeInterface> is not guaranteed to be traversable. How about adding an additional type check?

There are different options of fixing this problem.

  1. If you want to be on the safe side, you can add an additional type-check:

    $collection = json_decode($data, true);
    if ( ! is_array($collection)) {
        throw new \RuntimeException('$collection must be an array.');
    }
    
    foreach ($collection as $item) { /** ... */ }
    
  2. If you are sure that the expression is traversable, you might want to add a doc comment cast to improve IDE auto-completion and static analysis:

    /** @var array $collection */
    $collection = json_decode($data, true);
    
    foreach ($collection as $item) { /** .. */ }
    
  3. Mark the issue as a false-positive: Just hover the remove button, in the top-right corner of this issue for more options.

Loading history...
1429
            /* @var $text_node \DOMNode */
1430 36
            $text_node = $text_node_wrapper->getNode();
1431 36
            $xp = $text_node->getNodePath();
1432
1433 36
            $doSkip = false;
1434 36
            foreach (self::$skipTagsForRemoveWhitespace as $pattern) {
1435 36
                if (\strpos($xp, "/${pattern}") !== false) {
1436 5
                    $doSkip = true;
1437
1438 36
                    break;
1439
                }
1440
            }
1441 36
            if ($doSkip) {
1442 5
                continue;
1443
            }
1444
1445 35
            $text_node->nodeValue = \preg_replace(self::$regExSpace, ' ', $text_node->nodeValue);
1446
        }
1447
1448
        $dom->getDocument()->normalizeDocument();
1449
1450
        return $dom;
1451
    }
1452
1453
    /**
1454
     * WARNING: maybe bad for performance ...
1455
     *
1456
     * @param bool $keepBrokenHtml
1457
     *
1458
     * @return HtmlMin
1459
     */
1460
    public function useKeepBrokenHtml(bool $keepBrokenHtml): self
1461
    {
1462 2
        $this->keepBrokenHtml = $keepBrokenHtml;
1463
1464 2
        return $this;
1465
    }
1466
}
1467