Completed
Push — master ( 6e7e9c...46a8a1 )
by Lars
01:38
created

HtmlMin   D

Complexity

Total Complexity 193

Size/Duplication

Total Lines 1376
Duplicated Lines 1.45 %

Coupling/Cohesion

Components 1
Dependencies 3

Test Coverage

Coverage 94.85%

Importance

Changes 0
Metric Value
wmc 193
lcom 1
cbo 3
dl 20
loc 1376
ccs 350
cts 369
cp 0.9485
rs 4.4102
c 0
b 0
f 0

35 Methods

Rating   Name   Duplication   Size   Complexity  
A __construct() 0 3 1
A doOptimizeAttributes() 0 6 1
A doOptimizeViaHtmlDomParser() 0 6 1
A doRemoveComments() 0 6 1
A doRemoveDefaultAttributes() 0 6 1
A doRemoveDeprecatedAnchorName() 0 6 1
A doRemoveDeprecatedScriptCharsetAttribute() 0 6 1
A doRemoveDeprecatedTypeFromScriptTag() 0 6 1
A doRemoveDeprecatedTypeFromStylesheetLink() 0 6 1
A doRemoveEmptyAttributes() 0 6 1
A doRemoveHttpPrefixFromAttributes() 0 6 1
A doRemoveSpacesBetweenTags() 0 6 1
A doRemoveValueFromEmptyInput() 0 6 1
A doRemoveWhitespaceAroundTags() 0 6 1
A doRemoveOmittedQuotes() 0 6 1
A doRemoveOmittedHtmlTags() 0 6 1
A doSortCssClassNames() 0 6 1
A doSortHtmlAttributes() 0 6 1
A doSumUpWhitespace() 0 6 1
D domNodeAttributesToString() 0 34 9
F domNodeClosingTagOptional() 0 219 38
A getNextSiblingOfTypeDOMElement() 0 8 3
A isConditionalComment() 0 12 3
D minify() 0 127 9
B minifyHtmlDom() 0 64 6
C optimizeAttributes() 0 59 16
C protectTags() 0 45 7
D removeAttributeHelper() 20 66 42
A removeComments() 0 14 3
B removeWhitespaceAroundTags() 0 24 6
A restoreProtectedHtml() 0 11 2
A setDomainsToRemoveHttpPrefixFromAttributes() 0 6 1
B sortCssClassNames() 0 24 5
B sumUpWhitespace() 0 26 5
C domNodeToString() 0 72 19

How to fix   Duplicated Code    Complexity   

Duplicated Code

Duplicate code is one of the most pungent code smells. A rule that is often used is to re-structure code once it is duplicated in three or more places.

Common duplication problems, and corresponding solutions are:

Complex Class

 Tip:   Before tackling complexity, make sure that you eliminate any duplication first. This often can reduce the size of classes significantly.

Complex classes like HtmlMin often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes. You can also have a look at the cohesion graph to spot any un-connected, or weakly-connected components.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

While breaking up the class, it is a good idea to analyze how other classes use HtmlMin, and based on these observations, apply Extract Interface, too.

1
<?php
2
3
declare(strict_types=1);
4
5
namespace voku\helper;
6
7
/**
8
 * Class HtmlMin
9
 *
10
 * Inspired by:
11
 * - JS: https://github.com/kangax/html-minifier/blob/gh-pages/src/htmlminifier.js
12
 * - PHP: https://github.com/searchturbine/phpwee-php-minifier
13
 * - PHP: https://github.com/WyriHaximus/HtmlCompress
14
 * - PHP: https://github.com/zaininnari/html-minifier
15
 * - PHP: https://github.com/ampaze/PHP-HTML-Minifier
16
 * - Java: https://code.google.com/archive/p/htmlcompressor/
17
 *
18
 * Ideas:
19
 * - http://perfectionkills.com/optimizing-html/
20
 *
21
 * @package voku\helper
22
 */
23
class HtmlMin
24
{
25
  /**
26
   * @var string
27
   */
28
  private static $regExSpace = "/[[:space:]]{2,}|[\r\n]+/u";
29
30
  /**
31
   * @var array
32
   */
33
  private static $optional_end_tags = [
34
      'html',
35
      'head',
36
      'body',
37
  ];
38
39
  /**
40
   * // https://mathiasbynens.be/demo/javascript-mime-type
41
   * // https://developer.mozilla.org/en/docs/Web/HTML/Element/script#attr-type
42
   *
43
   * @var array
44
   */
45
  private static $executableScriptsMimeTypes = [
46
      'text/javascript'          => '',
47
      'text/ecmascript'          => '',
48
      'text/jscript'             => '',
49
      'application/javascript'   => '',
50
      'application/x-javascript' => '',
51
      'application/ecmascript'   => '',
52
  ];
53
54
  private static $selfClosingTags = [
55
      'area',
56
      'base',
57
      'basefont',
58
      'br',
59
      'col',
60
      'command',
61
      'embed',
62
      'frame',
63
      'hr',
64
      'img',
65
      'input',
66
      'isindex',
67
      'keygen',
68
      'link',
69
      'meta',
70
      'param',
71
      'source',
72
      'track',
73
      'wbr',
74
  ];
75
76
  private static $trimWhitespaceFromTags = [
77
      'article' => '',
78
      'br'      => '',
79
      'div'     => '',
80
      'footer'  => '',
81
      'hr'      => '',
82
      'nav'     => '',
83
      'p'       => '',
84
      'script'  => '',
85
  ];
86
87
  /**
88
   * @var array
89
   */
90
  private static $booleanAttributes = [
91
      'allowfullscreen' => '',
92
      'async'           => '',
93
      'autofocus'       => '',
94
      'autoplay'        => '',
95
      'checked'         => '',
96
      'compact'         => '',
97
      'controls'        => '',
98
      'declare'         => '',
99
      'default'         => '',
100
      'defaultchecked'  => '',
101
      'defaultmuted'    => '',
102
      'defaultselected' => '',
103
      'defer'           => '',
104
      'disabled'        => '',
105
      'enabled'         => '',
106
      'formnovalidate'  => '',
107
      'hidden'          => '',
108
      'indeterminate'   => '',
109
      'inert'           => '',
110
      'ismap'           => '',
111
      'itemscope'       => '',
112
      'loop'            => '',
113
      'multiple'        => '',
114
      'muted'           => '',
115
      'nohref'          => '',
116
      'noresize'        => '',
117
      'noshade'         => '',
118
      'novalidate'      => '',
119
      'nowrap'          => '',
120
      'open'            => '',
121
      'pauseonexit'     => '',
122
      'readonly'        => '',
123
      'required'        => '',
124
      'reversed'        => '',
125
      'scoped'          => '',
126
      'seamless'        => '',
127
      'selected'        => '',
128
      'sortable'        => '',
129
      'truespeed'       => '',
130
      'typemustmatch'   => '',
131
      'visible'         => '',
132
  ];
133
  /**
134
   * @var array
135
   */
136
  private static $skipTagsForRemoveWhitespace = [
137
      'code',
138
      'pre',
139
      'script',
140
      'style',
141
      'textarea',
142
  ];
143
144
  /**
145
   * @var array
146
   */
147
  private $protectedChildNodes = [];
148
149
  /**
150
   * @var string
151
   */
152
  private $protectedChildNodesHelper = 'html-min--voku--saved-content';
153
154
  /**
155
   * @var bool
156
   */
157
  private $doOptimizeViaHtmlDomParser = true;
158
159
  /**
160
   * @var bool
161
   */
162
  private $doOptimizeAttributes = true;
163
164
  /**
165
   * @var bool
166
   */
167
  private $doRemoveComments = true;
168
169
  /**
170
   * @var bool
171
   */
172
  private $doRemoveWhitespaceAroundTags = true;
173
174
  /**
175
   * @var bool
176
   */
177
  private $doRemoveOmittedQuotes = true;
178
179
  /**
180
   * @var bool
181
   */
182
  private $doRemoveOmittedHtmlTags = true;
183
184
  /**
185
   * @var bool
186
   */
187
  private $doRemoveHttpPrefixFromAttributes = false;
188
189
  /**
190
   * @var array
191
   */
192
  private $domainsToRemoveHttpPrefixFromAttributes = [
193
      'google.com',
194
      'google.de',
195
  ];
196
197
  /**
198
   * @var bool
199
   */
200
  private $doSortCssClassNames = true;
201
202
  /**
203
   * @var bool
204
   */
205
  private $doSortHtmlAttributes = true;
206
207
  /**
208
   * @var bool
209
   */
210
  private $doRemoveDeprecatedScriptCharsetAttribute = true;
211
212
  /**
213
   * @var bool
214
   */
215
  private $doRemoveDefaultAttributes = false;
216
217
  /**
218
   * @var bool
219
   */
220
  private $doRemoveDeprecatedAnchorName = true;
221
222
  /**
223
   * @var bool
224
   */
225
  private $doRemoveDeprecatedTypeFromStylesheetLink = true;
226
227
  /**
228
   * @var bool
229
   */
230
  private $doRemoveDeprecatedTypeFromScriptTag = true;
231
232
  /**
233
   * @var bool
234
   */
235
  private $doRemoveValueFromEmptyInput = true;
236
237
  /**
238
   * @var bool
239
   */
240
  private $doRemoveEmptyAttributes = true;
241
242
  /**
243
   * @var bool
244
   */
245
  private $doSumUpWhitespace = true;
246
247
  /**
248
   * @var bool
249
   */
250
  private $doRemoveSpacesBetweenTags = false;
251
252
  /**
253
   * @var
254
   */
255
  private $withDocType;
256
257
  /**
258
   * HtmlMin constructor.
259
   */
260 29
  public function __construct()
261
  {
262 29
  }
263
264
  /**
265
   * @param boolean $doOptimizeAttributes
266
   *
267
   * @return $this
268
   */
269 2
  public function doOptimizeAttributes(bool $doOptimizeAttributes = true)
270
  {
271 2
    $this->doOptimizeAttributes = $doOptimizeAttributes;
272
273 2
    return $this;
274
  }
275
276
  /**
277
   * @param boolean $doOptimizeViaHtmlDomParser
278
   *
279
   * @return $this
280
   */
281 1
  public function doOptimizeViaHtmlDomParser(bool $doOptimizeViaHtmlDomParser = true)
282
  {
283 1
    $this->doOptimizeViaHtmlDomParser = $doOptimizeViaHtmlDomParser;
284
285 1
    return $this;
286
  }
287
288
  /**
289
   * @param boolean $doRemoveComments
290
   *
291
   * @return $this
292
   */
293 2
  public function doRemoveComments(bool $doRemoveComments = true)
294
  {
295 2
    $this->doRemoveComments = $doRemoveComments;
296
297 2
    return $this;
298
  }
299
300
  /**
301
   * @param boolean $doRemoveDefaultAttributes
302
   *
303
   * @return $this
304
   */
305 2
  public function doRemoveDefaultAttributes(bool $doRemoveDefaultAttributes = true)
306
  {
307 2
    $this->doRemoveDefaultAttributes = $doRemoveDefaultAttributes;
308
309 2
    return $this;
310
  }
311
312
  /**
313
   * @param boolean $doRemoveDeprecatedAnchorName
314
   *
315
   * @return $this
316
   */
317 2
  public function doRemoveDeprecatedAnchorName(bool $doRemoveDeprecatedAnchorName = true)
318
  {
319 2
    $this->doRemoveDeprecatedAnchorName = $doRemoveDeprecatedAnchorName;
320
321 2
    return $this;
322
  }
323
324
  /**
325
   * @param boolean $doRemoveDeprecatedScriptCharsetAttribute
326
   *
327
   * @return $this
328
   */
329 2
  public function doRemoveDeprecatedScriptCharsetAttribute(bool $doRemoveDeprecatedScriptCharsetAttribute = true)
330
  {
331 2
    $this->doRemoveDeprecatedScriptCharsetAttribute = $doRemoveDeprecatedScriptCharsetAttribute;
332
333 2
    return $this;
334
  }
335
336
  /**
337
   * @param boolean $doRemoveDeprecatedTypeFromScriptTag
338
   *
339
   * @return $this
340
   */
341 2
  public function doRemoveDeprecatedTypeFromScriptTag(bool $doRemoveDeprecatedTypeFromScriptTag = true)
342
  {
343 2
    $this->doRemoveDeprecatedTypeFromScriptTag = $doRemoveDeprecatedTypeFromScriptTag;
344
345 2
    return $this;
346
  }
347
348
  /**
349
   * @param boolean $doRemoveDeprecatedTypeFromStylesheetLink
350
   *
351
   * @return $this
352
   */
353 2
  public function doRemoveDeprecatedTypeFromStylesheetLink(bool $doRemoveDeprecatedTypeFromStylesheetLink = true)
354
  {
355 2
    $this->doRemoveDeprecatedTypeFromStylesheetLink = $doRemoveDeprecatedTypeFromStylesheetLink;
356
357 2
    return $this;
358
  }
359
360
  /**
361
   * @param boolean $doRemoveEmptyAttributes
362
   *
363
   * @return $this
364
   */
365 2
  public function doRemoveEmptyAttributes(bool $doRemoveEmptyAttributes = true)
366
  {
367 2
    $this->doRemoveEmptyAttributes = $doRemoveEmptyAttributes;
368
369 2
    return $this;
370
  }
371
372
  /**
373
   * @param boolean $doRemoveHttpPrefixFromAttributes
374
   *
375
   * @return $this
376
   */
377 4
  public function doRemoveHttpPrefixFromAttributes(bool $doRemoveHttpPrefixFromAttributes = true)
378
  {
379 4
    $this->doRemoveHttpPrefixFromAttributes = $doRemoveHttpPrefixFromAttributes;
380
381 4
    return $this;
382
  }
383
384
  /**
385
   * @param boolean $doRemoveSpacesBetweenTags
386
   *
387
   * @return $this
388
   */
389
  public function doRemoveSpacesBetweenTags(bool $doRemoveSpacesBetweenTags = true)
390
  {
391
    $this->doRemoveSpacesBetweenTags = $doRemoveSpacesBetweenTags;
392
393
    return $this;
394
  }
395
396
  /**
397
   * @param boolean $doRemoveValueFromEmptyInput
398
   *
399
   * @return $this
400
   */
401 2
  public function doRemoveValueFromEmptyInput(bool $doRemoveValueFromEmptyInput = true)
402
  {
403 2
    $this->doRemoveValueFromEmptyInput = $doRemoveValueFromEmptyInput;
404
405 2
    return $this;
406
  }
407
408
  /**
409
   * @param boolean $doRemoveWhitespaceAroundTags
410
   *
411
   * @return $this
412
   */
413 4
  public function doRemoveWhitespaceAroundTags(bool $doRemoveWhitespaceAroundTags = true)
414
  {
415 4
    $this->doRemoveWhitespaceAroundTags = $doRemoveWhitespaceAroundTags;
416
417 4
    return $this;
418
  }
419
420
  /**
421
   * @param bool $doRemoveOmittedQuotes
422
   *
423
   * @return $this
424
   */
425 1
  public function doRemoveOmittedQuotes(bool $doRemoveOmittedQuotes = true)
426
  {
427 1
    $this->doRemoveOmittedQuotes = $doRemoveOmittedQuotes;
428
429 1
    return $this;
430
  }
431
432
  /**
433
   * @param bool $doRemoveOmittedHtmlTags
434
   *
435
   * @return $this
436
   */
437 1
  public function doRemoveOmittedHtmlTags(bool $doRemoveOmittedHtmlTags = true)
438
  {
439 1
    $this->doRemoveOmittedHtmlTags = $doRemoveOmittedHtmlTags;
440
441 1
    return $this;
442
  }
443
444
  /**
445
   * @param boolean $doSortCssClassNames
446
   *
447
   * @return $this
448
   */
449 2
  public function doSortCssClassNames(bool $doSortCssClassNames = true)
450
  {
451 2
    $this->doSortCssClassNames = $doSortCssClassNames;
452
453 2
    return $this;
454
  }
455
456
  /**
457
   * @param boolean $doSortHtmlAttributes
458
   *
459
   * @return $this
460
   */
461 2
  public function doSortHtmlAttributes(bool $doSortHtmlAttributes = true)
462
  {
463 2
    $this->doSortHtmlAttributes = $doSortHtmlAttributes;
464
465 2
    return $this;
466
  }
467
468
  /**
469
   * @param boolean $doSumUpWhitespace
470
   *
471
   * @return $this
472
   */
473 2
  public function doSumUpWhitespace(bool $doSumUpWhitespace = true)
474
  {
475 2
    $this->doSumUpWhitespace = $doSumUpWhitespace;
476
477 2
    return $this;
478
  }
479
480 25
  private function domNodeAttributesToString(\DOMNode $node): string
481
  {
482
    # Remove quotes around attribute values, when allowed (<p class="foo"> → <p class=foo>)
483 25
    $attrstr = '';
484 25
    if ($node->attributes != null) {
485 25
      foreach ($node->attributes as $attribute) {
486 14
        $attrstr .= $attribute->name;
487
488
        if (
489 14
            $this->doOptimizeAttributes === true
490
            &&
491 14
            isset(self::$booleanAttributes[$attribute->name])
492
        ) {
493 6
          $attrstr .= ' ';
494 6
          continue;
495
        }
496
497 14
        $attrstr .= '=';
498
499
        # http://www.whatwg.org/specs/web-apps/current-work/multipage/syntax.html#attributes-0
500 14
        $omitquotes = $this->doRemoveOmittedQuotes
501
                      &&
502 14
                      $attribute->value != ''
503
                      &&
504 14
                      0 == \preg_match('/["\'=<>` \t\r\n\f]+/', $attribute->value);
505
506 14
        $attr_val = $attribute->value;
507 14
        $attrstr .= ($omitquotes ? '' : '"') . $attr_val . ($omitquotes ? '' : '"');
508 14
        $attrstr .= ' ';
509
      }
510
    }
511
512 25
    return \trim($attrstr);
513
  }
514
515 24
  private function domNodeClosingTagOptional(\DOMNode $node): bool
516
  {
517 24
    $tag_name = $node->tagName;
0 ignored issues
show
Bug introduced by
The property tagName does not seem to exist in DOMNode.

An attempt at access to an undefined property has been detected. This may either be a typographical error or the property has been renamed but there are still references to its old name.

If you really want to allow access to undefined properties, you can define magic methods to allow access. See the php core documentation on Overloading.

Loading history...
518 24
    $nextSibling = $this->getNextSiblingOfTypeDOMElement($node);
519
520
    // https://html.spec.whatwg.org/multipage/syntax.html#syntax-tag-omission
521
522
    // Implemented:
523
    //
524
    // A <p> element's end tag may be omitted if the p element is immediately followed by an address, article, aside, blockquote, details, div, dl, fieldset, figcaption, figure, footer, form, h1, h2, h3, h4, h5, h6, header, hgroup, hr, main, menu, nav, ol, p, pre, section, table, or ul element, or if there is no more content in the parent element and the parent element is an HTML element that is not an a, audio, del, ins, map, noscript, or video element, or an autonomous custom element.
525
    // An <li> element's end tag may be omitted if the li element is immediately followed by another li element or if there is no more content in the parent element.
526
    // A <td> element's end tag may be omitted if the td element is immediately followed by a td or th element, or if there is no more content in the parent element.
527
    // An <option> element's end tag may be omitted if the option element is immediately followed by another option element, or if it is immediately followed by an optgroup element, or if there is no more content in the parent element.
528
    // A <tr> element's end tag may be omitted if the tr element is immediately followed by another tr element, or if there is no more content in the parent element.
529
    // A <th> element's end tag may be omitted if the th element is immediately followed by a td or th element, or if there is no more content in the parent element.
530
    // A <dt> element's end tag may be omitted if the dt element is immediately followed by another dt element or a dd element.
531
    // A <dd> element's end tag may be omitted if the dd element is immediately followed by another dd element or a dt element, or if there is no more content in the parent element.
532
    // An <rp> element's end tag may be omitted if the rp element is immediately followed by an rt or rp element, or if there is no more content in the parent element.
533
534
    // TODO:
535
    //
536
    // <html> may be omitted if first thing inside is not comment
537
    // <head> may be omitted if first thing inside is an element
538
    // <body> may be omitted if first thing inside is not space, comment, <meta>, <link>, <script>, <style> or <template>
539
    // <colgroup> may be omitted if first thing inside is <col>
540
    // <tbody> may be omitted if first thing inside is <tr>
541
    // An <optgroup> element's end tag may be omitted if the optgroup element is immediately followed by another optgroup element, or if there is no more content in the parent element.
542
    // A <colgroup> element's start tag may be omitted if the first thing inside the colgroup element is a col element, and if the element is not immediately preceded by another colgroup element whose end tag has been omitted. (It can't be omitted if the element is empty.)
543
    // A <colgroup> element's end tag may be omitted if the colgroup element is not immediately followed by ASCII whitespace or a comment.
544
    // A <caption> element's end tag may be omitted if the caption element is not immediately followed by ASCII whitespace or a comment.
545
    // A <thead> element's end tag may be omitted if the thead element is immediately followed by a tbody or tfoot element.
546
    // A <tbody> element's start tag may be omitted if the first thing inside the tbody element is a tr element, and if the element is not immediately preceded by a tbody, thead, or tfoot element whose end tag has been omitted. (It can't be omitted if the element is empty.)
547
    // A <tbody> element's end tag may be omitted if the tbody element is immediately followed by a tbody or tfoot element, or if there is no more content in the parent element.
548
    // A <tfoot> element's end tag may be omitted if there is no more content in the parent element.
549
    //
550
    // <-- However, a start tag must never be omitted if it has any attributes.
551
552 24
    return \in_array($tag_name, self::$optional_end_tags, true)
553
           ||
554
           (
555 21
               $tag_name == 'li'
556
               &&
557
               (
558 4
                   $nextSibling === null
559
                   ||
560
                   (
561 2
                       $nextSibling instanceof \DOMElement
562
                       &&
563 21
                       $nextSibling->tagName == 'li'
564
                   )
565
               )
566
           )
567
           ||
568
           (
569
               (
570 21
                   $tag_name == 'rp'
571
               )
572
               &&
573
               (
574
                   $nextSibling === null
575
                   ||
576
                   (
577
                       $nextSibling instanceof \DOMElement
578
                       &&
579
                       (
580
                           $nextSibling->tagName == 'rp'
581
                           ||
582 21
                           $nextSibling->tagName == 'rt'
583
                       )
584
                   )
585
               )
586
           )
587
           ||
588
           (
589 21
               $tag_name == 'tr'
590
               &&
591
               (
592 1
                   $nextSibling === null
593
                   ||
594
                   (
595 1
                       $nextSibling instanceof \DOMElement
596
                       &&
597 21
                       $nextSibling->tagName == 'tr'
598
                   )
599
               )
600
           )
601
           ||
602
           (
603
               (
604 21
                   $tag_name == 'td'
605
                   ||
606 21
                   $tag_name == 'th'
607
               )
608
               &&
609
               (
610 1
                   $nextSibling === null
611
                   ||
612
                   (
613 1
                       $nextSibling instanceof \DOMElement
614
                       &&
615
                       (
616 1
                           $nextSibling->tagName == 'td'
617
                           ||
618 21
                           $nextSibling->tagName == 'th'
619
                       )
620
                   )
621
               )
622
           )
623
           ||
624
           (
625
               (
626 21
                   $tag_name == 'dd'
627
                   ||
628 21
                   $tag_name == 'dt'
629
               )
630
               &&
631
               (
632
                   (
633 2
                       $nextSibling === null
634
                       &&
635 2
                       $tag_name == 'dd'
636
                   )
637
                   ||
638
                   (
639 2
                       $nextSibling instanceof \DOMElement
640
                       &&
641
                       (
642 2
                           $nextSibling->tagName == 'dd'
643
                           ||
644 21
                           $nextSibling->tagName == 'dt'
645
                       )
646
                   )
647
               )
648
           )
649
           ||
650
           (
651 21
               $tag_name == 'option'
652
               &&
653
               (
654
                   $nextSibling === null
655
                   ||
656
                   (
657
                       $nextSibling instanceof \DOMElement
658
                       &&
659
                       (
660
                           $nextSibling->tagName == 'option'
661
                           ||
662 21
                           $nextSibling->tagName == 'optgroup'
663
                       )
664
                   )
665
               )
666
           )
667
           ||
668
           (
669 21
               $tag_name == 'p'
670
               &&
671
               (
672
                   (
673 9
                       $nextSibling === null
674
                       &&
675
                       (
676 9
                           $node->parentNode !== null
677
                           &&
678 9
                           !\in_array(
679 9
                               $node->parentNode->tagName,
680
                               [
681 9
                                   'a',
682
                                   'audio',
683
                                   'del',
684
                                   'ins',
685
                                   'map',
686
                                   'noscript',
687
                                   'video',
688
                               ],
689 9
                               true
690
                           )
691
                       )
692
                   )
693
                   ||
694
                   (
695 5
                       $nextSibling instanceof \DOMElement
696
                       &&
697 5
                       \in_array(
698 5
                           $nextSibling->tagName,
699
                           [
700 5
                               'address',
701
                               'article',
702
                               'aside',
703
                               'blockquote',
704
                               'dir',
705
                               'div',
706
                               'dl',
707
                               'fieldset',
708
                               'footer',
709
                               'form',
710
                               'h1',
711
                               'h2',
712
                               'h3',
713
                               'h4',
714
                               'h5',
715
                               'h6',
716
                               'header',
717
                               'hgroup',
718
                               'hr',
719
                               'menu',
720
                               'nav',
721
                               'ol',
722
                               'p',
723
                               'pre',
724
                               'section',
725
                               'table',
726
                               'ul',
727
                           ],
728 24
                           true
729
                       )
730
                   )
731
               )
732
           );
733
  }
734
735 25
  protected function domNodeToString(\DOMNode $node): string
736
  {
737
    // init
738 25
    $html = '';
739
740 25
    foreach ($node->childNodes as $child) {
741
742 25
      if ($child instanceof \DOMDocumentType) {
743
744
        // add the doc-type only if it wasn't generated by DomDocument
745 11
        if ($this->withDocType !== true) {
746 3
          continue;
747
        }
748
749 8
        if ($child->name) {
750
751 8
          if (!$child->publicId && $child->systemId) {
752
            $tmpTypeSystem = 'SYSTEM';
753
            $tmpTypePublic = '';
754
          } else {
755 8
            $tmpTypeSystem = '';
756 8
            $tmpTypePublic = 'PUBLIC';
757
          }
758
759 8
          $html .= '<!DOCTYPE ' . $child->name . ''
760 8
                   . ($child->publicId ? ' ' . $tmpTypePublic . ' "' . $child->publicId . '"' : '')
761 8
                   . ($child->systemId ? ' ' . $tmpTypeSystem . ' "' . $child->systemId . '"' : '')
762 8
                   . '>';
763
        }
764
765 25
      } elseif ($child instanceof \DOMElement) {
766
767 25
        $html .= trim('<' . $child->tagName . ' ' . $this->domNodeAttributesToString($child));
768 25
        $html .= '>' . $this->domNodeToString($child);
769
770
        if (
771 25
            $this->doRemoveOmittedHtmlTags === false
772
            ||
773 25
            !$this->domNodeClosingTagOptional($child)
774
        ) {
775 19
          $html .= '</' . $child->tagName . '>';
776
        }
777
778 25
        if ($this->doRemoveWhitespaceAroundTags === false) {
779 3
          if ($child->nextSibling instanceof \DOMText) {
780 25
            $html .= ' ';
781
          }
782
        }
783
784 21
      } elseif ($child instanceof \DOMText) {
785
786 21
        if ($child->isWhitespaceInElementContent()) {
787
          if (
788 17
              $child->previousSibling !== null
789
              &&
790 17
              $child->nextSibling !== null
791
          ) {
792 17
            $html .= ' ';
793
          }
794
        } else {
795 21
          $html .= $child->wholeText;
796
        }
797
798
      } elseif ($child instanceof \DOMComment) {
799
800 25
        $html .= $child->wholeText;
801
802
      }
803
    }
804
805 25
    return $html;
806
  }
807
808
  /**
809
   * @param \DOMNode $node
810
   *
811
   * @return \DOMNode|null
812
   */
813 24
  protected function getNextSiblingOfTypeDOMElement(\DOMNode $node)
814
  {
815
    do {
816 24
      $node = $node->nextSibling;
817 24
    } while (!($node === null || $node instanceof \DOMElement));
818
819 24
    return $node;
820
  }
821
822
  /**
823
   * Check if the current string is an conditional comment.
824
   *
825
   * INFO: since IE >= 10 conditional comment are not working anymore
826
   *
827
   * <!--[if expression]> HTML <![endif]-->
828
   * <![if expression]> HTML <![endif]>
829
   *
830
   * @param string $comment
831
   *
832
   * @return bool
833
   */
834 3
  private function isConditionalComment($comment): bool
835
  {
836 3
    if (preg_match('/^\[if [^\]]+\]/', $comment)) {
837 2
      return true;
838
    }
839
840 3
    if (preg_match('/\[endif\]$/', $comment)) {
841 1
      return true;
842
    }
843
844 3
    return false;
845
  }
846
847
  /**
848
   * @param string $html
849
   * @param bool   $decodeUtf8Specials <p>Use this only in special cases, e.g. for PHP 5.3</p>
850
   *
851
   * @return string
852
   */
853 29
  public function minify($html, $decodeUtf8Specials = false): string
854
  {
855 29
    $html = (string)$html;
856 29
    if (!isset($html[0])) {
857 1
      return '';
858
    }
859
860 29
    $html = trim($html);
861 29
    if (!$html) {
862 3
      return '';
863
    }
864
865
    // init
866 26
    static $CACHE_SELF_CLOSING_TAGS = null;
867 26
    if ($CACHE_SELF_CLOSING_TAGS === null) {
868 1
      $CACHE_SELF_CLOSING_TAGS = implode('|', self::$selfClosingTags);
869
    }
870
871
    // reset
872 26
    $this->protectedChildNodes = [];
873
874
    // save old content
875 26
    $origHtml = $html;
876 26
    $origHtmlLength = UTF8::strlen($html);
877
878
    // -------------------------------------------------------------------------
879
    // Minify the HTML via "HtmlDomParser"
880
    // -------------------------------------------------------------------------
881
882 26
    if ($this->doOptimizeViaHtmlDomParser === true) {
883 25
      $html = $this->minifyHtmlDom($html, $decodeUtf8Specials);
884
    }
885
886
    // -------------------------------------------------------------------------
887
    // Trim whitespace from html-string. [protected html is still protected]
888
    // -------------------------------------------------------------------------
889
890
    // Remove extra white-space(s) between HTML attribute(s)
891 26
    $html = (string)\preg_replace_callback(
892 26
        '#<([^\/\s<>!]+)(?:\s+([^<>]*?)\s*|\s*)(\/?)>#',
893 26
        function ($matches) {
894 26
          return '<' . $matches[1] . (string)\preg_replace('#([^\s=]+)(\=([\'"]?)(.*?)\3)?(\s+|$)#s', ' $1$2', $matches[2]) . $matches[3] . '>';
895 26
        },
896 26
        $html
897
    );
898
899
900 26
    if ($this->doRemoveSpacesBetweenTags === true) {
901
      // Remove spaces that are between > and <
902
      $html = (string)\preg_replace('/(>) (<)/', '>$2', $html);
903
    }
904
905
    // -------------------------------------------------------------------------
906
    // Restore protected HTML-code.
907
    // -------------------------------------------------------------------------
908
909 26
    $html = (string)\preg_replace_callback(
910 26
        '/<(?<element>' . $this->protectedChildNodesHelper . ')(?<attributes> [^>]*)?>(?<value>.*?)<\/' . $this->protectedChildNodesHelper . '>/',
911 26
        [$this, 'restoreProtectedHtml'],
912 26
        $html
913
    );
914
915
    // -------------------------------------------------------------------------
916
    // Restore protected HTML-entities.
917
    // -------------------------------------------------------------------------
918
919 26
    if ($this->doOptimizeViaHtmlDomParser === true) {
920 25
      $html = HtmlDomParser::putReplacedBackToPreserveHtmlEntities($html);
921
    }
922
923
    // ------------------------------------
924
    // Final clean-up
925
    // ------------------------------------
926
927 26
    $html = UTF8::cleanup($html);
928
929 26
    $html = \str_replace(
930
        [
931 26
            'html>' . "\n",
932
            "\n" . '<html',
933
            'html/>' . "\n",
934
            "\n" . '</html',
935
            'head>' . "\n",
936
            "\n" . '<head',
937
            'head/>' . "\n",
938
            "\n" . '</head',
939
        ],
940
        [
941 26
            'html>',
942
            '<html',
943
            'html/>',
944
            '</html',
945
            'head>',
946
            '<head',
947
            'head/>',
948
            '</head',
949
        ],
950 26
        $html
951
    );
952
953
    // self closing tags, don't need a trailing slash ...
954 26
    $replace = [];
955 26
    $replacement = [];
956 26
    foreach (self::$selfClosingTags as $selfClosingTag) {
957 26
      $replace[] = '<' . $selfClosingTag . '/>';
958 26
      $replacement[] = '<' . $selfClosingTag . '>';
959 26
      $replace[] = '<' . $selfClosingTag . ' />';
960 26
      $replacement[] = '<' . $selfClosingTag . '>';
961
    }
962 26
    $html = \str_replace(
963 26
        $replace,
964 26
        $replacement,
965 26
        $html
966
    );
967
968 26
    $html = (string)\preg_replace('#<\b(' . $CACHE_SELF_CLOSING_TAGS . ')([^>]+)><\/\b\1>#', '<\\1\\2>', $html);
969
970
    // ------------------------------------
971
    // check if compression worked
972
    // ------------------------------------
973
974 26
    if ($origHtmlLength < UTF8::strlen($html)) {
975 2
      $html = $origHtml;
976
    }
977
978 26
    return $html;
979
  }
980
981
  /**
982
   * @param $html
983
   * @param $decodeUtf8Specials
984
   *
985
   * @return string
986
   */
987 25
  private function minifyHtmlDom($html, $decodeUtf8Specials): string
988
  {
989
    // init dom
990 25
    $dom = new HtmlDomParser();
991 25
    $dom->getDocument()->preserveWhiteSpace = false; // remove redundant white space
992 25
    $dom->getDocument()->formatOutput = false; // do not formats output with indentation
993
994
    // load dom
995 25
    $dom->loadHtml($html);
996
997 25
    $this->withDocType = (stripos(trim($html), '<!DOCTYPE') === 0);
998
999
    // -------------------------------------------------------------------------
1000
    // Protect HTML tags and conditional comments.
1001
    // -------------------------------------------------------------------------
1002
1003 25
    $dom = $this->protectTags($dom);
1004
1005
    // -------------------------------------------------------------------------
1006
    // Remove default HTML comments. [protected html is still protected]
1007
    // -------------------------------------------------------------------------
1008
1009 25
    if ($this->doRemoveComments === true) {
1010 24
      $dom = $this->removeComments($dom);
1011
    }
1012
1013
    // -------------------------------------------------------------------------
1014
    // Sum-Up extra whitespace from the Dom. [protected html is still protected]
1015
    // -------------------------------------------------------------------------
1016
1017 25
    if ($this->doSumUpWhitespace === true) {
1018 24
      $dom = $this->sumUpWhitespace($dom);
1019
    }
1020
1021 25
    foreach ($dom->find('*') as $element) {
0 ignored issues
show
Bug introduced by
The expression $dom->find('*') of type array<integer,object<vok...leHtmlDomNodeInterface> is not guaranteed to be traversable. How about adding an additional type check?

There are different options of fixing this problem.

  1. If you want to be on the safe side, you can add an additional type-check:

    $collection = json_decode($data, true);
    if ( ! is_array($collection)) {
        throw new \RuntimeException('$collection must be an array.');
    }
    
    foreach ($collection as $item) { /** ... */ }
    
  2. If you are sure that the expression is traversable, you might want to add a doc comment cast to improve IDE auto-completion and static analysis:

    /** @var array $collection */
    $collection = json_decode($data, true);
    
    foreach ($collection as $item) { /** .. */ }
    
  3. Mark the issue as a false-positive: Just hover the remove button, in the top-right corner of this issue for more options.

Loading history...
1022
1023
      // -------------------------------------------------------------------------
1024
      // Optimize html attributes. [protected html is still protected]
1025
      // -------------------------------------------------------------------------
1026
1027 25
      if ($this->doOptimizeAttributes === true) {
1028 24
        $this->optimizeAttributes($element);
1029
      }
1030
1031
      // -------------------------------------------------------------------------
1032
      // Remove whitespace around tags. [protected html is still protected]
1033
      // -------------------------------------------------------------------------
1034
1035 25
      if ($this->doRemoveWhitespaceAroundTags === true) {
1036 25
        $this->removeWhitespaceAroundTags($element);
1037
      }
1038
    }
1039
1040
    // -------------------------------------------------------------------------
1041
    // Convert the Dom into a string.
1042
    // -------------------------------------------------------------------------
1043
1044 25
    $html = $dom->fixHtmlOutput(
1045 25
        $this->domNodeToString($dom->getDocument()),
1046 25
        $decodeUtf8Specials
1047
    );
1048
1049 25
    return $html;
1050
  }
1051
1052
  /**
1053
   * Sort HTML-Attributes, so that gzip can do better work and remove some default attributes...
1054
   *
1055
   * @param SimpleHtmlDom $element
1056
   *
1057
   * @return bool
1058
   */
1059 24
  private function optimizeAttributes(SimpleHtmlDom $element): bool
1060
  {
1061 24
    $attributes = $element->getAllAttributes();
1062 24
    if ($attributes === null) {
1063 24
      return false;
1064
    }
1065
1066 13
    $attrs = [];
1067 13
    foreach ((array)$attributes as $attrName => $attrValue) {
1068
1069
      // -------------------------------------------------------------------------
1070
      // Remove optional "http:"-prefix from attributes.
1071
      // -------------------------------------------------------------------------
1072
1073 13
      if ($this->doRemoveHttpPrefixFromAttributes === true) {
1074
        if (
1075 3
            ($attrName === 'href' || $attrName === 'src' || $attrName === 'action')
1076
            &&
1077 3
            !(isset($attributes['rel']) && $attributes['rel'] === 'external')
1078
            &&
1079 3
            !(isset($attributes['target']) && $attributes['target'] === '_blank')
1080
        ) {
1081 2
          $attrValue = \str_replace('http://', '//', $attrValue);
1082
        }
1083
      }
1084
1085 13
      if ($this->removeAttributeHelper($element->tag, $attrName, $attrValue, $attributes)) {
1086 3
        $element->{$attrName} = null;
1087 3
        continue;
1088
      }
1089
1090
      // -------------------------------------------------------------------------
1091
      // Sort css-class-names, for better gzip results.
1092
      // -------------------------------------------------------------------------
1093
1094 13
      if ($this->doSortCssClassNames === true) {
1095 13
        $attrValue = $this->sortCssClassNames($attrName, $attrValue);
1096
      }
1097
1098 13
      if ($this->doSortHtmlAttributes === true) {
1099 13
        $attrs[$attrName] = $attrValue;
1100 13
        $element->{$attrName} = null;
1101
      }
1102
    }
1103
1104
    // -------------------------------------------------------------------------
1105
    // Sort html-attributes, for better gzip results.
1106
    // -------------------------------------------------------------------------
1107
1108 13
    if ($this->doSortHtmlAttributes === true) {
1109 13
      \ksort($attrs);
1110 13
      foreach ($attrs as $attrName => $attrValue) {
1111 13
        $attrValue = HtmlDomParser::replaceToPreserveHtmlEntities($attrValue);
1112 13
        $element->setAttribute($attrName, $attrValue, true);
1113
      }
1114
    }
1115
1116 13
    return true;
1117
  }
1118
1119
  /**
1120
   * Prevent changes of inline "styles" and "scripts".
1121
   *
1122
   * @param HtmlDomParser $dom
1123
   *
1124
   * @return HtmlDomParser
1125
   */
1126 25
  private function protectTags(HtmlDomParser $dom): HtmlDomParser
1127
  {
1128
    // init
1129 25
    $counter = 0;
1130
1131 25
    foreach ($dom->find('script, style') as $element) {
0 ignored issues
show
Bug introduced by
The expression $dom->find('script, style') of type array<integer,object<vok...leHtmlDomNodeInterface> is not guaranteed to be traversable. How about adding an additional type check?

There are different options of fixing this problem.

  1. If you want to be on the safe side, you can add an additional type-check:

    $collection = json_decode($data, true);
    if ( ! is_array($collection)) {
        throw new \RuntimeException('$collection must be an array.');
    }
    
    foreach ($collection as $item) { /** ... */ }
    
  2. If you are sure that the expression is traversable, you might want to add a doc comment cast to improve IDE auto-completion and static analysis:

    /** @var array $collection */
    $collection = json_decode($data, true);
    
    foreach ($collection as $item) { /** .. */ }
    
  3. Mark the issue as a false-positive: Just hover the remove button, in the top-right corner of this issue for more options.

Loading history...
1132
1133
      // skip external links
1134 3
      if ($element->tag === 'script' || $element->tag === 'style') {
1135 3
        $attributes = $element->getAllAttributes();
1136 3
        if (isset($attributes['src'])) {
1137 2
          continue;
1138
        }
1139
      }
1140
1141 2
      $this->protectedChildNodes[$counter] = $element->text();
1142 2
      $element->getNode()->nodeValue = '<' . $this->protectedChildNodesHelper . ' data-' . $this->protectedChildNodesHelper . '="' . $counter . '"></' . $this->protectedChildNodesHelper . '>';
1143
1144 2
      ++$counter;
1145
    }
1146
1147 25
    $dom->getDocument()->normalizeDocument();
1148
1149 25
    foreach ($dom->find('//comment()') as $element) {
0 ignored issues
show
Bug introduced by
The expression $dom->find('//comment()') of type array<integer,object<vok...leHtmlDomNodeInterface> is not guaranteed to be traversable. How about adding an additional type check?

There are different options of fixing this problem.

  1. If you want to be on the safe side, you can add an additional type-check:

    $collection = json_decode($data, true);
    if ( ! is_array($collection)) {
        throw new \RuntimeException('$collection must be an array.');
    }
    
    foreach ($collection as $item) { /** ... */ }
    
  2. If you are sure that the expression is traversable, you might want to add a doc comment cast to improve IDE auto-completion and static analysis:

    /** @var array $collection */
    $collection = json_decode($data, true);
    
    foreach ($collection as $item) { /** .. */ }
    
  3. Mark the issue as a false-positive: Just hover the remove button, in the top-right corner of this issue for more options.

Loading history...
1150 3
      $text = $element->text();
1151
1152
      // skip normal comments
1153 3
      if ($this->isConditionalComment($text) === false) {
1154 3
        continue;
1155
      }
1156
1157 2
      $this->protectedChildNodes[$counter] = '<!--' . $text . '-->';
1158
1159
      /* @var $node \DOMComment */
1160 2
      $node = $element->getNode();
1161 2
      $child = new \DOMText('<' . $this->protectedChildNodesHelper . ' data-' . $this->protectedChildNodesHelper . '="' . $counter . '"></' . $this->protectedChildNodesHelper . '>');
1162 2
      $element->getNode()->parentNode->replaceChild($child, $node);
1163
1164 2
      ++$counter;
1165
    }
1166
1167 25
    $dom->getDocument()->normalizeDocument();
1168
1169 25
    return $dom;
1170
  }
1171
1172
  /**
1173
   * Check if the attribute can be removed.
1174
   *
1175
   * @param string $tag
1176
   * @param string $attrName
1177
   * @param string $attrValue
1178
   * @param array  $allAttr
1179
   *
1180
   * @return bool
1181
   */
1182 13
  private function removeAttributeHelper($tag, $attrName, $attrValue, $allAttr): bool
1183
  {
1184
    // remove defaults
1185 13
    if ($this->doRemoveDefaultAttributes === true) {
1186
1187 1
      if ($tag === 'script' && $attrName === 'language' && $attrValue === 'javascript') {
1188
        return true;
1189
      }
1190
1191 1
      if ($tag === 'form' && $attrName === 'method' && $attrValue === 'get') {
1192
        return true;
1193
      }
1194
1195 1
      if ($tag === 'input' && $attrName === 'type' && $attrValue === 'text') {
1196
        return true;
1197
      }
1198
1199 1
      if ($tag === 'area' && $attrName === 'shape' && $attrValue === 'rect') {
1200
        return true;
1201
      }
1202
    }
1203
1204
    // remove deprecated charset-attribute (the browser will use the charset from the HTTP-Header, anyway)
1205 13 View Code Duplication
    if ($this->doRemoveDeprecatedScriptCharsetAttribute === true) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
1206 13
      if ($tag === 'script' && $attrName === 'charset' && !isset($allAttr['src'])) {
1207
        return true;
1208
      }
1209
    }
1210
1211
    // remove deprecated anchor-jump
1212 13 View Code Duplication
    if ($this->doRemoveDeprecatedAnchorName === true) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
1213 13
      if ($tag === 'a' && $attrName === 'name' && isset($allAttr['id']) && $allAttr['id'] === $attrValue) {
1214
        return true;
1215
      }
1216
    }
1217
1218
    // remove "type=text/css" for css links
1219 13 View Code Duplication
    if ($this->doRemoveDeprecatedTypeFromStylesheetLink === true) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
1220 13
      if ($tag === 'link' && $attrName === 'type' && $attrValue === 'text/css' && isset($allAttr['rel']) && $allAttr['rel'] === 'stylesheet') {
1221 1
        return true;
1222
      }
1223
    }
1224
1225
    // remove deprecated script-mime-types
1226 13 View Code Duplication
    if ($this->doRemoveDeprecatedTypeFromScriptTag === true) {
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated across your project.

Duplicated code is one of the most pungent code smells. If you need to duplicate the same code in three or more different places, we strongly encourage you to look into extracting the code into a single class or operation.

You can also find more detailed suggestions in the “Code” section of your repository.

Loading history...
1227 13
      if ($tag === 'script' && $attrName === 'type' && isset($allAttr['src'], self::$executableScriptsMimeTypes[$attrValue])) {
1228 1
        return true;
1229
      }
1230
    }
1231
1232
    // remove 'value=""' from <input type="text">
1233 13
    if ($this->doRemoveValueFromEmptyInput === true) {
1234 13
      if ($tag === 'input' && $attrName === 'value' && $attrValue === '' && isset($allAttr['type']) && $allAttr['type'] === 'text') {
1235 1
        return true;
1236
      }
1237
    }
1238
1239
    // remove some empty attributes
1240 13
    if ($this->doRemoveEmptyAttributes === true) {
1241 13
      if (\trim($attrValue) === '' && \preg_match('/^(?:class|id|style|title|lang|dir|on(?:focus|blur|change|click|dblclick|mouse(?:down|up|over|move|out)|key(?:press|down|up)))$/', $attrName)) {
1242 2
        return true;
1243
      }
1244
    }
1245
1246 13
    return false;
1247
  }
1248
1249
  /**
1250
   * Remove comments in the dom.
1251
   *
1252
   * @param HtmlDomParser $dom
1253
   *
1254
   * @return HtmlDomParser
1255
   */
1256 24
  private function removeComments(HtmlDomParser $dom): HtmlDomParser
1257
  {
1258 24
    foreach ($dom->find('//comment()') as $commentWrapper) {
0 ignored issues
show
Bug introduced by
The expression $dom->find('//comment()') of type array<integer,object<vok...leHtmlDomNodeInterface> is not guaranteed to be traversable. How about adding an additional type check?

There are different options of fixing this problem.

  1. If you want to be on the safe side, you can add an additional type-check:

    $collection = json_decode($data, true);
    if ( ! is_array($collection)) {
        throw new \RuntimeException('$collection must be an array.');
    }
    
    foreach ($collection as $item) { /** ... */ }
    
  2. If you are sure that the expression is traversable, you might want to add a doc comment cast to improve IDE auto-completion and static analysis:

    /** @var array $collection */
    $collection = json_decode($data, true);
    
    foreach ($collection as $item) { /** .. */ }
    
  3. Mark the issue as a false-positive: Just hover the remove button, in the top-right corner of this issue for more options.

Loading history...
1259 3
      $comment = $commentWrapper->getNode();
1260 3
      $val = $comment->nodeValue;
1261 3
      if (\strpos($val, '[') === false) {
1262 3
        $comment->parentNode->removeChild($comment);
1263
      }
1264
    }
1265
1266 24
    $dom->getDocument()->normalizeDocument();
1267
1268 24
    return $dom;
1269
  }
1270
1271
  /**
1272
   * Trim tags in the dom.
1273
   *
1274
   * @param SimpleHtmlDom $element
1275
   *
1276
   * @return void
1277
   */
1278 24
  private function removeWhitespaceAroundTags(SimpleHtmlDom $element)
1279
  {
1280 24
    if (isset(self::$trimWhitespaceFromTags[$element->tag])) {
1281 10
      $node = $element->getNode();
1282
1283 10
      $candidates = [];
1284 10
      if ($node->childNodes->length > 0) {
1285 9
        $candidates[] = $node->firstChild;
1286 9
        $candidates[] = $node->lastChild;
1287 9
        $candidates[] = $node->previousSibling;
1288 9
        $candidates[] = $node->nextSibling;
1289
      }
1290
1291 10
      foreach ($candidates as &$candidate) {
1292 9
        if ($candidate === null) {
1293 5
          continue;
1294
        }
1295
1296 9
        if ($candidate->nodeType === 3) {
1297 9
          $candidate->nodeValue = \preg_replace(self::$regExSpace, ' ', $candidate->nodeValue);
1298
        }
1299
      }
1300
    }
1301 24
  }
1302
1303
  /**
1304
   * Callback function for preg_replace_callback use.
1305
   *
1306
   * @param array $matches PREG matches
1307
   *
1308
   * @return string
1309
   */
1310 2
  private function restoreProtectedHtml($matches): string
1311
  {
1312 2
    \preg_match('/.*"(?<id>\d*)"/', $matches['attributes'], $matchesInner);
1313
1314 2
    $html = '';
1315 2
    if (isset($this->protectedChildNodes[$matchesInner['id']])) {
1316 2
      $html .= $this->protectedChildNodes[$matchesInner['id']];
1317
    }
1318
1319 2
    return $html;
1320
  }
1321
1322
  /**
1323
   * @param array $domainsToRemoveHttpPrefixFromAttributes
1324
   *
1325
   * @return $this
1326
   */
1327 2
  public function setDomainsToRemoveHttpPrefixFromAttributes($domainsToRemoveHttpPrefixFromAttributes)
1328
  {
1329 2
    $this->domainsToRemoveHttpPrefixFromAttributes = $domainsToRemoveHttpPrefixFromAttributes;
1330
1331 2
    return $this;
1332
  }
1333
1334
  /**
1335
   * @param $attrName
1336
   * @param $attrValue
1337
   *
1338
   * @return string
1339
   */
1340 13
  private function sortCssClassNames($attrName, $attrValue): string
1341
  {
1342 13
    if ($attrName !== 'class' || !$attrValue) {
1343 12
      return $attrValue;
1344
    }
1345
1346 8
    $classes = \array_unique(
1347 8
        \explode(' ', $attrValue)
1348
    );
1349 8
    \sort($classes);
1350
1351 8
    $attrValue = '';
1352 8
    foreach ($classes as $class) {
1353
1354 8
      if (!$class) {
1355 2
        continue;
1356
      }
1357
1358 8
      $attrValue .= \trim($class) . ' ';
1359
    }
1360 8
    $attrValue = \trim($attrValue);
1361
1362 8
    return $attrValue;
1363
  }
1364
1365
  /**
1366
   * Sum-up extra whitespace from dom-nodes.
1367
   *
1368
   * @param HtmlDomParser $dom
1369
   *
1370
   * @return HtmlDomParser
1371
   */
1372 24
  private function sumUpWhitespace(HtmlDomParser $dom): HtmlDomParser
1373
  {
1374 24
    $textnodes = $dom->find('//text()');
1375 24
    foreach ($textnodes as $textnodeWrapper) {
0 ignored issues
show
Bug introduced by
The expression $textnodes of type array<integer,object<vok...leHtmlDomNodeInterface> is not guaranteed to be traversable. How about adding an additional type check?

There are different options of fixing this problem.

  1. If you want to be on the safe side, you can add an additional type-check:

    $collection = json_decode($data, true);
    if ( ! is_array($collection)) {
        throw new \RuntimeException('$collection must be an array.');
    }
    
    foreach ($collection as $item) { /** ... */ }
    
  2. If you are sure that the expression is traversable, you might want to add a doc comment cast to improve IDE auto-completion and static analysis:

    /** @var array $collection */
    $collection = json_decode($data, true);
    
    foreach ($collection as $item) { /** .. */ }
    
  3. Mark the issue as a false-positive: Just hover the remove button, in the top-right corner of this issue for more options.

Loading history...
1376
      /* @var $textnode \DOMNode */
1377 20
      $textnode = $textnodeWrapper->getNode();
1378 20
      $xp = $textnode->getNodePath();
1379
1380 20
      $doSkip = false;
1381 20
      foreach (self::$skipTagsForRemoveWhitespace as $pattern) {
1382 20
        if (\strpos($xp, "/$pattern") !== false) {
1383 3
          $doSkip = true;
1384 20
          break;
1385
        }
1386
      }
1387 20
      if ($doSkip) {
1388 3
        continue;
1389
      }
1390
1391 20
      $textnode->nodeValue = \preg_replace(self::$regExSpace, ' ', $textnode->nodeValue);
1392
    }
1393
1394 24
    $dom->getDocument()->normalizeDocument();
1395
1396 24
    return $dom;
1397
  }
1398
}
1399